Python 熊猫矢量化中的复杂条件
我对蟒蛇和熊猫相当陌生,所以我还在学习。我有一个数据帧,它包含一堆OHLCV数据,这些数据被加载到一个数据帧中Python 熊猫矢量化中的复杂条件,python,pandas,dataframe,vectorization,Python,Pandas,Dataframe,Vectorization,我对蟒蛇和熊猫相当陌生,所以我还在学习。我有一个数据帧,它包含一堆OHLCV数据,这些数据被加载到一个数据帧中 Timestamp Open High Low Close Volume Trades Timestamp
Timestamp Open High Low Close Volume Trades
Timestamp
2015-08-07 14:03:00+00:00 2015-08-07 14:03:00+00:00 3.00000 3.00000 3.00 3.00 81.857278 2
2015-08-07 17:19:00+00:00 2015-08-07 17:19:00+00:00 3.00001 3.00001 3.00 3.00 42.073291 2
2015-08-08 06:43:00+00:00 2015-08-08 06:43:00+00:00 3.00000 3.00000 3.00 3.00 0.400000 1
2015-08-08 09:31:00+00:00 2015-08-08 09:31:00+00:00 2.00000 2.00000 2.00 2.00 125.000000 2
2015-08-08 16:30:00+00:00 2015-08-08 16:30:00+00:00 1.20000 1.20000 1.20 1.20 54.759700 1
... ... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 2020-12-31 23:55:00+00:00 738.49000 738.49000 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 2020-12-31 23:56:00+00:00 738.07000 738.07000 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 2020-12-31 23:57:00+00:00 738.15000 738.15000 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 2020-12-31 23:58:00+00:00 738.14000 738.15000 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 2020-12-31 23:59:00+00:00 737.01000 737.60000 737.01 737.45 3.129885 8
正如您所看到的,由于在这一分钟内没有实际的卷发生,某些条目之间缺少分钟。我要做的是创建一个新列,该列包含前一分钟的卷,如果前一分钟没有行,则该值应为0
我试图使用以下语句:
df.loc[
(df['Timestamp'].shift().notna()==True)和
(
((df['Timestamp'].shift().dt.minute.astype(int)=(df['Timestamp'].dt.minute.astype(int)-1))和(df['Timestamp'].dt.minute.astype(int)>=2))|
((df['Timestamp'].shift().dt.minute.astype(int)==59)和(df['Timestamp'].dt.minute.astype(int)>=2))
),'previousVolume']=df['Volume'].shift().aType(浮点)
对于添加0
s,我会检查上面的负数
我得到的错误如下:
Traceback (most recent call last):
File "engulfing_test.py", line 145, in <module>
), 'previousVolume'] = df['Volume'].shift().astype(float)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5877, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 631, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 673, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1068, in astype_nansafe
raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer
asfreq
首先,去掉与索引相同的重复时间戳列。
将索引留在原处asfreq
将更改索引的频率
df.drop('Timestamp', axis=1).asfreq('min', fill_value=0)
Open High Low Close Volume Trades
Timestamp
2015-08-07 14:03:00+00:00 3.00 3.00 3.00 3.00 81.857278 2
2015-08-07 14:04:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:05:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:06:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:07:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 738.49 738.49 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 738.07 738.07 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 738.15 738.15 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 738.14 738.15 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 737.01 737.60 737.01 737.45 3.129885 8
[2841717 rows x 6 columns]
我仍然能够以同样的方式访问索引吗?是的。此外,我更新了我的答案。当我添加该语句时,这似乎对我不起作用。难道我不直接使用drop
df=df.drop(['Timestamp'],axis=1)
Yes,这应该是有效的。我在执行asfreq
Traceback(最后一次调用)时出现以下错误:文件“Enguming_test.py”,第122行,在df.asfreq('min',fill_value=0)。。。。TypeError:值应为“时间戳”或“NaT”。改为“int”。
{'Timestamp': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): Timestamp('2015-08-07 14:03:00+0000', tz='UTC'), Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): Timestamp('2015-08-07 17:19:00+0000', tz='UTC'), Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): Timestamp('2015-08-08 06:43:00+0000', tz='UTC'), Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): Timestamp('2015-08-08 09:31:00+0000', tz='UTC'), Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): Timestamp('2015-08-08 16:30:00+0000', tz='UTC')}, 'Open': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'High': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Low': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Close': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Volume': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 81.85727776, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 42.07329055, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 0.4, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 125.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 54.7597}, 'Trades': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 2, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 1, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1}}
df.drop('Timestamp', axis=1).asfreq('min', fill_value=0)
Open High Low Close Volume Trades
Timestamp
2015-08-07 14:03:00+00:00 3.00 3.00 3.00 3.00 81.857278 2
2015-08-07 14:04:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:05:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:06:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:07:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 738.49 738.49 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 738.07 738.07 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 738.15 738.15 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 738.14 738.15 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 737.01 737.60 737.01 737.45 3.129885 8
[2841717 rows x 6 columns]