Python 如何使用熊猫在OHLCV数据中创建缺失的烛台?
我有一个从列表中构建的数据框,我试图识别可能丢失的蜡烛。当发现丢失的蜡烛时,我想在Pandas数据框中插入一个新行,其中OHLC值为前一天(行),音量设置为0Python 如何使用熊猫在OHLCV数据中创建缺失的烛台?,python,pandas,Python,Pandas,我有一个从列表中构建的数据框,我试图识别可能丢失的蜡烛。当发现丢失的蜡烛时,我想在Pandas数据框中插入一个新行,其中OHLC值为前一天(行),音量设置为0 list = [[1528992000000, 9.462e-05, 0.00010814, 9.202e-05, 0.00010544, 4600204.415809431], [1529164800000, 0.00010309, 0.000
list = [[1528992000000,
9.462e-05,
0.00010814,
9.202e-05,
0.00010544,
4600204.415809431],
[1529164800000,
0.00010309,
0.00010529,
0.0001016,
0.00010162,
1987989.1357407586],
[1529251200000,
0.00010165,
0.00010173,
9.402e-05,
9.508e-05,
1724979.853516945]]
df = pd.DataFrame(list)
df.columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume']
df.set_index('timestamp', inplace = True)
df.index = pd.to_datetime( df.index, utc = True, unit = 'ms')
In [627]: df
Out[627]:
open high low close \
timestamp
2018-06-14 16:00:00+00:00 0.000095 0.000108 0.000092 0.000105
2018-06-16 16:00:00+00:00 0.000103 0.000105 0.000102 0.000102
2018-06-17 16:00:00+00:00 0.000102 0.000102 0.000094 0.000095
volume
timestamp
2018-06-14 16:00:00+00:00 4.600204e+06
2018-06-16 16:00:00+00:00 1.987989e+06
2018-06-17 16:00:00+00:00 1.724980e+06
在本例中,烛光2018-06-15 16:00:00+00:00
丢失,我想重新创建这样的数据帧。我怎样才能做到这一点
open high low close \
timestamp
2018-06-14 16:00:00+00:00 0.000095 0.000108 0.000092 0.000105
2018-06-15 16:00:00+00:00 0.000095 0.000108 0.000092 0.000105
2018-06-16 16:00:00+00:00 0.000103 0.000105 0.000102 0.000102
2018-06-17 16:00:00+00:00 0.000102 0.000102 0.000094 0.000095
volume
timestamp
2018-06-14 16:00:00+00:00 4.600204e+06
2018-06-15 16:00:00+00:00 0
2018-06-16 16:00:00+00:00 1.987989e+06
2018-06-17 16:00:00+00:00 1.724980e+06
因此,基本上,我能够通过比较索引与覆盖该期间的日期时间序列来识别缺失的索引,然后我选择每个缺失蜡烛的前一行,并创建一个包含所需数据的列表new
我的问题是,我无法找出将列表插入数据帧的最佳方式。我该怎么做
# Create sequence
start = pd.to_datetime( list[0][0], utc = True, unit = 'ms')
end = pd.to_datetime( list[-1][0], utc = True, unit = 'ms')
sequence = pd.date_range(start, end)
# Compare sequence
diff = sequence.difference(df.index)
if len(diff) != 0 :
for i in diff :
prev = i + datetime.timedelta( days = -1 )
row = df.loc[pd.Timestamp(prev)] # select previous row
new = [row[0], row[1], row[2], row[3], 0] # create desired data
# Doesn't return an error but failed to insert the new row
df.loc[i] = new
#df.loc[pd.Timestamp(i)] = new
您可以使用以下方法将缺少的日期直接添加到数据框中: 要添加前一天的值,可以使用: 如果由于特定原因需要删除卷,请执行以下操作:
df = df.asfreq('D')
cols = ['open','high','low', 'close'] # list of columns to update
df[cols] = df[cols].fillna(method='ffill')
对于以前丢失的日期,卷将被删除。如果需要0,还可以使用:
new
插入得很好,只是缺少了一种数据帧索引df.sort\u index()
df = df.fillna(method='ffill')
df = df.asfreq('D')
cols = ['open','high','low', 'close'] # list of columns to update
df[cols] = df[cols].fillna(method='ffill')
df.update(df['volume'].fillna(0))