Python 如何在行上迭代以查找表中列的常量值
我有一个时间序列数据帧,我想找到与其他行中的值相匹配的行的常量值。假设这是DF:Python 如何在行上迭代以查找表中列的常量值,python,pandas,loops,Python,Pandas,Loops,我有一个时间序列数据帧,我想找到与其他行中的值相匹配的行的常量值。假设这是DF: temp = [27.18, 27.18, 27.18, 27.18, 20.82, 20.82, 20.82, 20.82, 15.18, 15.18, 15.18, 15.18, 15.24, 15.24, 15.24, 15.24, 20.4 , 20.4 , 20.4 , 20.4 , 21.48, 21.48, 21.48, 21.48, 27.66, 27.66, 27.66
temp = [27.18, 27.18, 27.18, 27.18, 20.82, 20.82, 20.82, 20.82, 15.18,
15.18, 15.18, 15.18, 15.24, 15.24, 15.24, 15.24, 20.4 , 20.4 ,
20.4 , 20.4 , 21.48, 21.48, 21.48, 21.48, 27.66, 27.66, 27.66,
27.66, 27.9 , 27.9 , 27.9 , 27.9 , 27.9 , 27.9 , 27.9 , 27.9 ,
27.84, 27.84, 27.84, 27.84, 27.84, 27.84, 27.84, 27.84, 21.72,
21.72, 21.72, 21.72]
heat = [11.94, 12. , 10.56, 6. , 6. , 6. , 6. , 6. , 6. ,
6. , 6. , 6. , 6. , 6.78, 9. , 9. , 9. , 9. ,
9. , 9. , 9. , 11.58, 12. , 11.94, 11.94, 12. , 12. ,
11.94, 11.94, 12. , 11.94, 12. , 11.94, 12. , 12. , 11.94,
12. , 11.94, 11.94, 12. , 11.94, 9.48, 9. , 9. , 9. ,
9. , 8.94, 9. ]
date = ['2016-01-29 12:00:00', '2016-01-29 12:15:00',
'2016-01-29 12:30:00', '2016-01-29 12:45:00',
'2016-01-29 13:00:00', '2016-01-29 13:15:00',
'2016-01-29 13:30:00', '2016-01-29 13:45:00',
'2016-01-29 14:00:00', '2016-01-29 14:15:00',
'2016-01-29 14:30:00', '2016-01-29 14:45:00',
'2016-01-29 15:00:00', '2016-01-29 15:15:00',
'2016-01-29 15:30:00', '2016-01-29 15:45:00',
'2016-01-29 16:00:00', '2016-01-29 16:15:00',
'2016-01-29 16:30:00', '2016-01-29 16:45:00',
'2016-01-29 17:00:00', '2016-01-29 17:15:00',
'2016-01-29 17:30:00', '2016-01-29 17:45:00',
'2016-01-29 18:00:00', '2016-01-29 18:15:00',
'2016-01-29 18:30:00', '2016-01-29 18:45:00',
'2016-01-29 19:00:00', '2016-01-29 19:15:00',
'2016-01-29 19:30:00', '2016-01-29 19:45:00',
'2016-01-29 20:00:00', '2016-01-29 20:15:00',
'2016-01-29 20:30:00', '2016-01-29 20:45:00',
'2016-01-29 21:00:00', '2016-01-29 21:15:00',
'2016-01-29 21:30:00', '2016-01-29 21:45:00',
'2016-01-29 22:00:00', '2016-01-29 22:15:00',
'2016-01-29 22:30:00', '2016-01-29 22:45:00',
'2016-01-29 23:00:00', '2016-01-29 23:15:00',
'2016-01-29 23:30:00', '2016-01-29 23:45:00']
df = pd.DataFrame(date, columns=['date'])
df.insert(1 ,'temp', temp, True)
df.insert(2, 'heat', heat, True )
df.index = df.date
del df['date']
情节如下所示:
我需要找到标记在两条黄线之间的区域,其中的值几乎是恒定的,没有渐变区域。我一直在使用移位法,但这不是很理想。你知道如何提前实现这个感谢吗。
我正在尝试的移位方法
df.heat!=df.heat.shift(1)).cumsum()
期望输出:
此绘图遮罩是您要查找的吗:
df[df.temp.duplicated() & df.heat.duplicated()].plot()
第二次尝试:
df= pd.DataFrame({"temp":temp,"heat":heat}, index= pd.to_datetime(date) )
thtemp=0.5 # threshold
thheat=0.5
crit= df.temp.diff().abs().lt(thtemp) & df.heat.diff().abs().lt(thheat)
rng=np.arange(1,len(df)+1)
df["const"]= np.where(crit.eq(False),rng,np.nan)
df["const"]= df.const.ffill()
temp heat const
2016-01-29 12:00:00 27.18 11.94 1.0
2016-01-29 12:15:00 27.18 12.00 1.0
2016-01-29 12:30:00 27.18 10.56 3.0
2016-01-29 12:45:00 27.18 6.00 4.0
2016-01-29 13:00:00 20.82 6.00 5.0
2016-01-29 13:15:00 20.82 6.00 5.0
2016-01-29 13:30:00 20.82 6.00 5.0
2016-01-29 13:45:00 20.82 6.00 5.0
2016-01-29 14:00:00 15.18 6.00 9.0
2016-01-29 14:15:00 15.18 6.00 9.0
2016-01-29 14:30:00 15.18 6.00 9.0
2016-01-29 14:45:00 15.18 6.00 9.0
2016-01-29 15:00:00 15.24 6.00 9.0
...
G= df.groupby(df.const)
for key,grp in G:
if len(grp)>1:
print(f"\t{grp.index[0]}\n\t{grp.index[-1]}\n")
2016-01-29 12:00:00
2016-01-29 12:15:00
2016-01-29 13:00:00
2016-01-29 13:45:00
2016-01-29 14:00:00
2016-01-29 15:00:00
2016-01-29 15:30:00
2016-01-29 15:45:00
2016-01-29 16:00:00
2016-01-29 16:45:00
2016-01-29 17:15:00
2016-01-29 17:45:00
2016-01-29 18:00:00
2016-01-29 22:00:00
2016-01-29 22:15:00
2016-01-29 22:45:00
2016-01-29 23:00:00
2016-01-29 23:45:00
绘图:
vrep=13
#vrep= (df.temp.mean()+df.heat.mean())/2
for key,grp in G:
if len(grp)>1:
ser= grp.const.replace(key,vrep).reindex(df.index)
plt.plot(ser.index,ser,color="orange", linewidth=2)
plt.plot(df.index,df.temp,color="darkgreen",label="temp")
plt.plot(df.index,df.heat,color="darkblue",label="heat")
plt.legend(loc="best")
plt.grid()
plt.show()
编辑:这是第一个解决方案,但没有提供所有常量段:
thtemp=0.5 # threshold
thheat=0.5
crit= df.temp.diff().abs().lt(thtemp) & df.heat.diff().abs().lt(thheat)
df["const"]= crit.astype(int).replace(0,np.nan)
展开当前接受答案,创建数据框
import pandas as pd
temp = [27.18, 27.18, 27.18, 27.18, 20.82, 20.82, 20.82, 20.82, 15.18,
15.18, 15.18, 15.18, 15.24, 15.24, 15.24, 15.24, 20.4 , 20.4 ,
20.4 , 20.4 , 21.48, 21.48, 21.48, 21.48, 27.66, 27.66, 27.66,
27.66, 27.9 , 27.9 , 27.9 , 27.9 , 27.9 , 27.9 , 27.9 , 27.9 ,
27.84, 27.84, 27.84, 27.84, 27.84, 27.84, 27.84, 27.84, 21.72,
21.72, 21.72, 21.72]
heat = [11.94, 12. , 10.56, 6. , 6. , 6. , 6. , 6. , 6. ,
6. , 6. , 6. , 6. , 6.78, 9. , 9. , 9. , 9. ,
9. , 9. , 9. , 11.58, 12. , 11.94, 11.94, 12. , 12. ,
11.94, 11.94, 12. , 11.94, 12. , 11.94, 12. , 12. , 11.94,
12. , 11.94, 11.94, 12. , 11.94, 9.48, 9. , 9. , 9. ,
9. , 8.94, 9. ]
date = ['2016-01-29 12:00:00', '2016-01-29 12:15:00',
'2016-01-29 12:30:00', '2016-01-29 12:45:00',
'2016-01-29 13:00:00', '2016-01-29 13:15:00',
'2016-01-29 13:30:00', '2016-01-29 13:45:00',
'2016-01-29 14:00:00', '2016-01-29 14:15:00',
'2016-01-29 14:30:00', '2016-01-29 14:45:00',
'2016-01-29 15:00:00', '2016-01-29 15:15:00',
'2016-01-29 15:30:00', '2016-01-29 15:45:00',
'2016-01-29 16:00:00', '2016-01-29 16:15:00',
'2016-01-29 16:30:00', '2016-01-29 16:45:00',
'2016-01-29 17:00:00', '2016-01-29 17:15:00',
'2016-01-29 17:30:00', '2016-01-29 17:45:00',
'2016-01-29 18:00:00', '2016-01-29 18:15:00',
'2016-01-29 18:30:00', '2016-01-29 18:45:00',
'2016-01-29 19:00:00', '2016-01-29 19:15:00',
'2016-01-29 19:30:00', '2016-01-29 19:45:00',
'2016-01-29 20:00:00', '2016-01-29 20:15:00',
'2016-01-29 20:30:00', '2016-01-29 20:45:00',
'2016-01-29 21:00:00', '2016-01-29 21:15:00',
'2016-01-29 21:30:00', '2016-01-29 21:45:00',
'2016-01-29 22:00:00', '2016-01-29 22:15:00',
'2016-01-29 22:30:00', '2016-01-29 22:45:00',
'2016-01-29 23:00:00', '2016-01-29 23:15:00',
'2016-01-29 23:30:00', '2016-01-29 23:45:00']
df = pd.DataFrame({'date': date, 'temp': temp, 'heat': heat})
df.index = pd.to_datetime(df['date'],infer_datetime_format=True)
del df['date']
创建值为常量时为True的布尔变量
thtemp=0.5 # threshold
thheat=0.5
df["const"] = df.temp.diff().abs().lt(thtemp) & df.heat.diff().abs().lt(thheat)
df.head()
temp heat const
date
2016-01-29 12:00:00 27.18 11.94 False
2016-01-29 12:15:00 27.18 12.00 True
2016-01-29 12:30:00 27.18 10.56 False
2016-01-29 12:45:00 27.18 6.00 False
2016-01-29 13:00:00 20.82 6.00 False
当const==True时,打印并填充区域
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
ax.plot(df.index, df['temp'])
ax.plot(df.index, df['heat'])
ax.fill_between(df.index, 0, 1, where=df['const'], alpha=0.1, transform=ax.get_xaxis_transform())
plt.gcf().autofmt_xdate()
plt.show()
为什么要把数据框弄得如此复杂?这是一个原始数据的样本,有一些变化。我可以看到四个区域,其中df有恒定的行,最右边的一对黄线不是其中之一。我需要找到热量和温度相对恒定的区域。“相对恒定”!=“常数”。你需要非常清楚你想要什么通过这个函数我得到了一个更好的绘图,但是绘图中仍然有斜坡区域。绘图应该是直线,没有波动发生。这意味着,热量和温度都变得恒定。谢谢你的回答,它工作得很好。这里只有一件事:
2016-01-29 13:15:00 2016-01-29 13:45:00
常量从13:00:00开始。如果可能的话,我也能拿到吗。你是怎么画红色常数的line@Arpit缺失片段的问题已修复,请参见上文。非常感谢!这正是我要找的!
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
ax.plot(df.index, df['temp'])
ax.plot(df.index, df['heat'])
ax.fill_between(df.index, 0, 1, where=df['const'], alpha=0.1, transform=ax.get_xaxis_transform())
plt.gcf().autofmt_xdate()
plt.show()