Python 熊猫矢量化中的复杂条件

Python 熊猫矢量化中的复杂条件,python,pandas,dataframe,vectorization,Python,Pandas,Dataframe,Vectorization,我对蟒蛇和熊猫相当陌生,所以我还在学习。我有一个数据帧,它包含一堆OHLCV数据,这些数据被加载到一个数据帧中 Timestamp Open High Low Close Volume Trades Timestamp

我对蟒蛇和熊猫相当陌生,所以我还在学习。我有一个数据帧,它包含一堆OHLCV数据,这些数据被加载到一个数据帧中

                                          Timestamp       Open       High     Low   Close      Volume  Trades
Timestamp                                                                                                    
2015-08-07 14:03:00+00:00 2015-08-07 14:03:00+00:00    3.00000    3.00000    3.00    3.00   81.857278       2
2015-08-07 17:19:00+00:00 2015-08-07 17:19:00+00:00    3.00001    3.00001    3.00    3.00   42.073291       2
2015-08-08 06:43:00+00:00 2015-08-08 06:43:00+00:00    3.00000    3.00000    3.00    3.00    0.400000       1
2015-08-08 09:31:00+00:00 2015-08-08 09:31:00+00:00    2.00000    2.00000    2.00    2.00  125.000000       2
2015-08-08 16:30:00+00:00 2015-08-08 16:30:00+00:00    1.20000    1.20000    1.20    1.20   54.759700       1
...                                             ...        ...        ...     ...     ...         ...     ...
2020-12-31 23:55:00+00:00 2020-12-31 23:55:00+00:00  738.49000  738.49000  738.49  738.49    0.748789       3
2020-12-31 23:56:00+00:00 2020-12-31 23:56:00+00:00  738.07000  738.07000  737.72  737.72    2.491733       8
2020-12-31 23:57:00+00:00 2020-12-31 23:57:00+00:00  738.15000  738.15000  737.94  738.05   56.043875       9
2020-12-31 23:58:00+00:00 2020-12-31 23:58:00+00:00  738.14000  738.15000  737.55  737.75   80.826279      16
2020-12-31 23:59:00+00:00 2020-12-31 23:59:00+00:00  737.01000  737.60000  737.01  737.45    3.129885       8
正如您所看到的,由于在这一分钟内没有实际的卷发生,某些条目之间缺少分钟。我要做的是创建一个新列,该列包含前一分钟的卷,如果前一分钟没有行,则该值应为0

我试图使用以下语句:

df.loc[
(df['Timestamp'].shift().notna()==True)和
(
((df['Timestamp'].shift().dt.minute.astype(int)=(df['Timestamp'].dt.minute.astype(int)-1))和(df['Timestamp'].dt.minute.astype(int)>=2))|
((df['Timestamp'].shift().dt.minute.astype(int)==59)和(df['Timestamp'].dt.minute.astype(int)>=2))
),'previousVolume']=df['Volume'].shift().aType(浮点)
对于添加
0
s,我会检查上面的负数

我得到的错误如下:

Traceback (most recent call last):
  File "engulfing_test.py", line 145, in <module>
    ), 'previousVolume'] = df['Volume'].shift().astype(float)
  File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5877, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 631, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 673, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1068, in astype_nansafe
    raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer
asfreq
首先,去掉与索引相同的重复
时间戳
列。
将索引留在原处
asfreq
将更改索引的频率

df.drop('Timestamp', axis=1).asfreq('min', fill_value=0)

                             Open    High     Low   Close     Volume  Trades
Timestamp                                                                   
2015-08-07 14:03:00+00:00    3.00    3.00    3.00    3.00  81.857278       2
2015-08-07 14:04:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
2015-08-07 14:05:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
2015-08-07 14:06:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
2015-08-07 14:07:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
...                           ...     ...     ...     ...        ...     ...
2020-12-31 23:55:00+00:00  738.49  738.49  738.49  738.49   0.748789       3
2020-12-31 23:56:00+00:00  738.07  738.07  737.72  737.72   2.491733       8
2020-12-31 23:57:00+00:00  738.15  738.15  737.94  738.05  56.043875       9
2020-12-31 23:58:00+00:00  738.14  738.15  737.55  737.75  80.826279      16
2020-12-31 23:59:00+00:00  737.01  737.60  737.01  737.45   3.129885       8

[2841717 rows x 6 columns]

我仍然能够以同样的方式访问索引吗?是的。此外,我更新了我的答案。当我添加该语句时,这似乎对我不起作用。难道我不直接使用
drop
df=df.drop(['Timestamp'],axis=1)
Yes,这应该是有效的。我在执行
asfreq
Traceback(最后一次调用)时出现以下错误:文件“Enguming_test.py”,第122行,在df.asfreq('min',fill_value=0)。。。。TypeError:值应为“时间戳”或“NaT”。改为“int”。
{'Timestamp': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): Timestamp('2015-08-07 14:03:00+0000', tz='UTC'), Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): Timestamp('2015-08-07 17:19:00+0000', tz='UTC'), Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): Timestamp('2015-08-08 06:43:00+0000', tz='UTC'), Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): Timestamp('2015-08-08 09:31:00+0000', tz='UTC'), Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): Timestamp('2015-08-08 16:30:00+0000', tz='UTC')}, 'Open': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'High': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Low': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Close': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Volume': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 81.85727776, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 42.07329055, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 0.4, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 125.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 54.7597}, 'Trades': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 2, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 1, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1}}
df.drop('Timestamp', axis=1).asfreq('min', fill_value=0)

                             Open    High     Low   Close     Volume  Trades
Timestamp                                                                   
2015-08-07 14:03:00+00:00    3.00    3.00    3.00    3.00  81.857278       2
2015-08-07 14:04:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
2015-08-07 14:05:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
2015-08-07 14:06:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
2015-08-07 14:07:00+00:00    0.00    0.00    0.00    0.00   0.000000       0
...                           ...     ...     ...     ...        ...     ...
2020-12-31 23:55:00+00:00  738.49  738.49  738.49  738.49   0.748789       3
2020-12-31 23:56:00+00:00  738.07  738.07  737.72  737.72   2.491733       8
2020-12-31 23:57:00+00:00  738.15  738.15  737.94  738.05  56.043875       9
2020-12-31 23:58:00+00:00  738.14  738.15  737.55  737.75  80.826279      16
2020-12-31 23:59:00+00:00  737.01  737.60  737.01  737.45   3.129885       8

[2841717 rows x 6 columns]