Pandas 熊猫时间序列回顾
有一个带有时间序列的数据帧:Pandas 熊猫时间序列回顾,pandas,time-series,Pandas,Time Series,有一个带有时间序列的数据帧: import pandas as pd import numpy as np df = pd.DataFrame({'times': np.array(['1994-07-25 15:00:00.000', '1994-07-25 16:00:00.000', '1994-07-26 18:45:00.000',
import pandas as pd
import numpy as np
df = pd.DataFrame({'times': np.array(['1994-07-25 15:00:00.000',
'1994-07-25 16:00:00.000',
'1994-07-26 18:45:00.000',
'1994-07-27 15:15:00.000',
'1994-07-27 16:00:00.000',
'1994-07-28 18:45:00.000',
'1994-07-28 19:15:00.000',
], dtype='datetime64'),
'diff': [0.0,0.03,0.04,0.05,0,0.01,0.0,]})
差值(两个信号之间)恢复为零,我想找出它偏离直线的时间-即非零->低于所需结果的时间
df['deviation_time_delta'] = pd.to_timedelta(['nan',
'nan',
'nan',
'nan',
'2 days 00:00:00.000',
'nan',
'0 days 00:30:00.000',
])
我已经试过了,但它并不漂亮,而且对任意长度偏差也不起作用:
df['diff_1'] = df['diff'].shift(1)
df['diff_2'] = df['diff'].shift(2)
df['diff_3'] = df['diff'].shift(3)
df['diff_4'] = df['diff'].shift(4)
df['times_1'] = df['times'].shift(1)
df['times_2'] = df['times'].shift(2)
df['times_3'] = df['times'].shift(3)
df['times_4'] = df['times'].shift(4)
def calc_dev_time_delta(cur_diff, diff_1, diff_2, diff_3, diff_4, cur_time, time_1, time_2, time_3, time_4):
if cur_diff != 0.0: return np.nan
if diff_1 == 0.0: return np.nan
if diff_2 == 0.0: return cur_time - time_1
if diff_3 == 0.0: return cur_time - time_2
if diff_4 == 0.0: return cur_time - time_3
df['dev_time_delta'] = df.apply(lambda row: calc_dev_time_delta(row['diff'], row['diff_1'], row['diff_2'],row['diff_3'],row['diff_4'], row['times'], row['times_1'], row['times_2'], row['times_3'], row['times_4']), axis=1)
您知道实现此结果的更好/更干净的方法吗?如果我理解正确,您希望计算与上一行的差异,其中
diff
为0
使用groupby
和diff
df
diff times
0 0.00 1994-07-25 15:00:00
1 0.03 1994-07-25 16:00:00
2 0.04 1994-07-26 18:45:00
3 0.05 1994-07-27 15:15:00
4 0.00 1994-07-27 16:00:00
5 0.01 1994-07-28 18:45:00
6 0.00 1994-07-28 19:15:00
df['deviation_time_delta'] = df.groupby('diff')['times'].diff()
df['deviation_time_delta'].loc[df['diff']!=0] = 0
df
diff times deviation_time_delta
0 0.00 1994-07-25 15:00:00 NaT
1 0.03 1994-07-25 16:00:00 0
2 0.04 1994-07-26 18:45:00 0
3 0.05 1994-07-27 15:15:00 0
4 0.00 1994-07-27 16:00:00 2 days 01:00:00
5 0.01 1994-07-28 18:45:00 0
6 0.00 1994-07-28 19:15:00 1 days 03:15:00
我不确定我是否明白你想要什么,但这不就是工作吗
dfZero = df[df['diff'] == 0]
dfZero['deltaT'] = dfZero.times.diff()
df = df.merge(dfZero, how='left')
print(df)
输出:
times diff deltaT
0 1994-07-25 15:00:00 0.00 NaT
1 1994-07-25 16:00:00 0.03 NaT
2 1994-07-26 18:45:00 0.04 NaT
3 1994-07-27 15:15:00 0.05 NaT
4 1994-07-27 16:00:00 0.00 2 days 01:00:00
5 1994-07-28 18:45:00 0.01 NaT
6 1994-07-28 19:15:00 0.00 1 days 03:15:00
受godot回答和评论的启发,请参见下面我最终得出的解决方案:
df['diff_1'] = df['diff'].shift(1)
def keep_row(cur_diff, prev_diff):
return cur_diff == 0.0 or prev_diff == 0.0
df['keep'] = df.apply(lambda row: keep_row(row['diff'], row['diff_1']), axis=1)
df_short = df[df['keep']]
df_short = df_short.drop(['diff_1'], axis=1)
df_short['diff_1'] = df_short['diff'].shift(1)
df_short['times_1'] = df_short['times'].shift(1)
def calc_deviation_time(cur_diff, prev_time, cur_time):
if cur_diff != 0.0: return np.nan
return cur_time - prev_time
df_short['deviation_time'] = df_short.apply( lambda row: calc_deviation_time(row['diff'], row['times_1'], row['times']), axis=1)
df_short = df_short.drop(['keep', 'diff_1', 'times_1'], axis=1)
df_short
谢谢你看。有一个细微的差别,它测量了零之间的时间,我在寻找零之后的第一个偏差值和随后的零之间的时间。我将试用你建议的一种变体。从times创建第二列,但移位为1,然后在新列和原始“times”列之间进行差异。。。