Python 根据条件在数据帧中插入行
我用熊猫来处理巨大的时间序列数据集。如果两个连续索引之间的差异大于5,我希望在数据帧中的行之间添加行 实际:Python 根据条件在数据帧中插入行,python,python-2.7,pandas,dataframe,Python,Python 2.7,Pandas,Dataframe,我用熊猫来处理巨大的时间序列数据集。如果两个连续索引之间的差异大于5,我希望在数据帧中的行之间添加行 实际: a result Date 1497544649 1 1.0 1497544652 9 1.0 1497544661 9 NaN 预期: a result Date 1497544649 1 1.0 1497544652 9
a result
Date
1497544649 1 1.0
1497544652 9 1.0
1497544661 9 NaN
预期:
a result
Date
1497544649 1 1.0
1497544652 9 1.0
1497544657 9 0
1497544661 9 NaN
我在索引上使用diff()来获取两个连续索引之间的差异,但不确定如果差异大于5,如何插入记录
import pandas as pd
df = pd.DataFrame([{"Date": 1497544649,"a":1, "result": 1},
{"Date": 1497544652,"a": 9, "result": 1},
{"Date": 1497544661,"a": 9, "result": 1}])
df.set_index("Date", inplace=True)
df.index.to_series().diff().fillna(0).to_frame("diff")
任何关于如何实现这一目标的建议都将不胜感激
谢谢你你领先了一步。添加一个diff列以便于过滤 获取与规则匹配的数据帧的索引并插入行
df['diff'] = df.index.to_series().diff().fillna(0).to_frame("diff")
matches = df[df['diff'] > 5].index.tolist()
for i in matches:
diff = df.loc[i]['diff']
interval = round(diff/2) # index some place in the middle
df.loc[i-interval] = [0, 0, 0, diff-interval] # insert row before matched index
df.loc[i]['diff'] = interval # may not need to update the interval
df.sort_index(inplace=False) # pandas appends by default so we should sort this
del df.diff # we can remove this