Python 填写烛台OHLCV数据_Python_Pandas

Python 填写烛台OHLCV数据

python pandas

Python 填写烛台OHLCV数据,python,pandas,Python,Pandas,我有一个这样的数据帧 OPEN HIGH LOW CLOSE VOL 2012-01-01 19:00:00 449000 449000 449000 449000 1336303000 2012-01-01 20:00:00 NaN NaN NaN NaN NaN 2012-01-01 21:00:00 NaN NaN NaN

我有一个这样的数据帧

                       OPEN    HIGH     LOW   CLOSE         VOL
2012-01-01 19:00:00  449000  449000  449000  449000  1336303000
2012-01-01 20:00:00     NaN     NaN     NaN     NaN         NaN
2012-01-01 21:00:00     NaN     NaN     NaN     NaN         NaN
2012-01-01 22:00:00     NaN     NaN     NaN     NaN         NaN
2012-01-01 23:00:00     NaN     NaN     NaN     NaN         NaN
...
                         OPEN      HIGH       LOW     CLOSE          VOL
2013-04-24 14:00:00  11700000  12000000  11600000  12000000  20647095439
2013-04-24 15:00:00  12000000  12399000  11979000  12399000  23997107870
2013-04-24 16:00:00  12399000  12400000  11865000  12100000   9379191474
2013-04-24 17:00:00  12300000  12397995  11850000  11850000   4281521826
2013-04-24 18:00:00  11850000  11850000  10903000  11800000  15546034128

我需要按照这个规则填写

NaN

当开、高、低、关为NaN时

将音量设置为0
将打开、高、低、接近上一个关闭蜡烛值

else keep NaN

说明了丢失的数据行为。您要查找的咒语是fillna方法，它采用以下值：

In [1381]: df2
Out[1381]: 
        one       two     three four   five           timestamp
a       NaN  1.138469 -2.400634  bar   True                 NaT
c       NaN  0.025653 -1.386071  bar  False                 NaT
e  0.863937  0.252462  1.500571  bar   True 2012-01-01 00:00:00
f  1.053202 -2.338595 -0.374279  bar   True 2012-01-01 00:00:00
h       NaN -1.157886 -0.551865  bar  False                 NaT

In [1382]: df2.fillna(0)
Out[1382]: 
        one       two     three four   five           timestamp
a  0.000000  1.138469 -2.400634  bar   True 1970-01-01 00:00:00
c  0.000000  0.025653 -1.386071  bar  False 1970-01-01 00:00:00
e  0.863937  0.252462  1.500571  bar   True 2012-01-01 00:00:00
f  1.053202 -2.338595 -0.374279  bar   True 2012-01-01 00:00:00
h  0.000000 -1.157886 -0.551865  bar  False 1970-01-01 00:00:00

您甚至可以向前和向后传播它们：

In [1384]: df
Out[1384]: 
        one       two     three
a       NaN  1.138469 -2.400634
c       NaN  0.025653 -1.386071
e  0.863937  0.252462  1.500571
f  1.053202 -2.338595 -0.374279
h       NaN -1.157886 -0.551865

In [1385]: df.fillna(method='pad')
Out[1385]: 
        one       two     three
a       NaN  1.138469 -2.400634
c       NaN  0.025653 -1.386071
e  0.863937  0.252462  1.500571
f  1.053202 -2.338595 -0.374279
h  1.053202 -1.157886 -0.551865

对于您的具体情况，我认为您需要：

df['VOL'].fillna(0)
df.fillna(df['CLOSE'])

下面是如何通过掩蔽实现的

模拟带有一些孔的框架（a是“闭合”字段）

我们都是南人

In [24]: mask_0 = pd.isnull(df).all(axis=1)

In [25]: mask_0
Out[25]: 
2013-01-01 00:00:00    False
2013-01-01 00:01:00     True
2013-01-01 00:02:00     True
2013-01-01 00:03:00    False
2013-01-01 00:04:00    False
2013-01-01 00:05:00    False
2013-01-01 00:06:00    False
2013-01-01 00:07:00    False
2013-01-01 00:08:00    False
2013-01-01 00:09:00    False
Freq: T, dtype: bool

我们想提出一个

In [26]: mask_fill = pd.isnull(df['B']) & pd.isnull(df['C'])

In [27]: mask_fill
Out[27]: 
2013-01-01 00:00:00    False
2013-01-01 00:01:00     True
2013-01-01 00:02:00     True
2013-01-01 00:03:00    False
2013-01-01 00:04:00    False
2013-01-01 00:05:00     True
2013-01-01 00:06:00     True
2013-01-01 00:07:00     True
2013-01-01 00:08:00    False
2013-01-01 00:09:00    False
Freq: T, dtype: bool

先发制人

In [28]: df.loc[mask_fill,'C'] = df['A']

In [29]: df.loc[mask_fill,'B'] = df['A']

填补0的空白

In [30]: df.loc[mask_0] = 0

完成

由于其他两个答案都不起作用，这里有一个完整的答案

我在这里测试两种方法。第一个是基于working4coin对hd1答案的评论，第二个是较慢的纯python实现。很明显，python实现应该较慢，但我决定对这两种方法计时，以确保并量化结果

def nans_to_prev_close_method1(data_frame):
    data_frame['volume'] = data_frame['volume'].fillna(0.0)  # volume should always be 0 (if there were no trades in this interval)
    data_frame['close'] = data_frame.fillna(method='pad')  # ie pull the last close into this close
    # now copy the close that was pulled down from the last timestep into this row, across into o/h/l
    data_frame['open'] = data_frame['open'].fillna(data_frame['close']) 
    data_frame['low'] = data_frame['low'].fillna(data_frame['close'])
    data_frame['high'] = data_frame['high'].fillna(data_frame['close'])

方法1在c中完成了大部分繁重的工作（在pandas代码中），因此应该非常快

缓慢的python方法（方法2）如下所示

def nans_to_prev_close_method2(data_frame):
    prev_row = None
    for index, row in data_frame.iterrows():
        if np.isnan(row['open']):  # row.isnull().any():
            pclose = prev_row['close']
            # assumes first row has no nulls!!
            row['open'] = pclose
            row['high'] = pclose
            row['low'] = pclose
            row['close'] = pclose
            row['volume'] = 0.0
        prev_row = row

在它们两个上测试定时：

df = trades_to_ohlcv(PATH_TO_RAW_TRADES_CSV, '1s') # splits raw trades into secondly candles
df2 = df.copy()

wrapped1 = wrapper(nans_to_prev_close_method1, df)
wrapped2 = wrapper(nans_to_prev_close_method2, df2)

print("method 1: %.2f sec" % timeit.timeit(wrapped1, number=1))
print("method 2: %.2f sec" % timeit.timeit(wrapped2, number=1))

结果是：

method 1:   0.46 sec
method 2: 151.82 sec

显然，方法1要快得多（大约快330倍）

对于卷，它是

df['VOL']=df['VOL'].fillna（0）

但是

df=df.fillna（df['CLOSE']）

不起作用我这样做了

df['VOL']=df['VOL'].fillna（0）df['CLOSE']=df['CLOSE'].fillna（）df['OPEN']=df['OPEN'].fillna（df['CLOSE']）

self.dataframe不回答作者的评论

df = trades_to_ohlcv(PATH_TO_RAW_TRADES_CSV, '1s') # splits raw trades into secondly candles
df2 = df.copy()

wrapped1 = wrapper(nans_to_prev_close_method1, df)
wrapped2 = wrapper(nans_to_prev_close_method2, df2)

print("method 1: %.2f sec" % timeit.timeit(wrapped1, number=1))
print("method 2: %.2f sec" % timeit.timeit(wrapped2, number=1))

method 1:   0.46 sec
method 2: 151.82 sec