Python n行后的大熊猫求和
我正在使用(当前)一个索引和一列的数据帧。索引是日期,列是数字。我想首先找到值小于某个级别的行:Python n行后的大熊猫求和,python,python-3.x,pandas,Python,Python 3.x,Pandas,我正在使用(当前)一个索引和一列的数据帧。索引是日期,列是数字。我想首先找到值小于某个级别的行: list = [] for index,row in df.iterrows(): if row < -0.1: list.append(#sum the next 60 values) 编辑:这是数据的外观: data Date Ticker 2015-06-23 -0.010959 2015-06-24 -0.002770 2015-06-25
list = []
for index,row in df.iterrows():
if row < -0.1:
list.append(#sum the next 60 values)
编辑:这是数据的外观:
data
Date Ticker
2015-06-23 -0.010959
2015-06-24 -0.002770
2015-06-25 0.005556
2015-06-26 0.008287
2015-06-29 -0.013699
2015-06-30 0.005556
2015-07-01 0.006077
2015-07-02 -0.010983
您可以在
reversed
dataframe上计算rolling\u sum
,以实现该技巧
import pandas as pd
import numpy as np
# sample data
# =================================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(25, 2)*0.1, columns=['A', 'B'], index=pd.date_range('2010-01-01', periods=25, freq='B'))
df
A B
2010-01-01 0.1764 0.0400
2010-01-04 0.0979 0.2241
2010-01-05 0.1868 -0.0977
2010-01-06 0.0950 -0.0151
2010-01-07 -0.0103 0.0411
2010-01-08 0.0144 0.1454
2010-01-11 0.0761 0.0122
2010-01-12 0.0444 0.0334
2010-01-13 0.1494 -0.0205
2010-01-14 0.0313 -0.0854
2010-01-15 -0.2553 0.0654
2010-01-18 0.0864 -0.0742
2010-01-19 0.2270 -0.1454
2010-01-20 0.0046 -0.0187
2010-01-21 0.1533 0.1469
2010-01-22 0.0155 0.0378
2010-01-25 -0.0888 -0.1981
2010-01-26 -0.0348 0.0156
2010-01-27 0.1230 0.1202
2010-01-28 -0.0387 -0.0302
2010-01-29 -0.1049 -0.1420
2010-02-01 -0.1706 0.1951
2010-02-02 -0.0510 -0.0438
2010-02-03 -0.1253 0.0777
2010-02-04 -0.1614 -0.0213
# processing
# =================================================
from functools import partial
def sum_over_next_n_days(col, n_days, threshold):
# sum over next several days, reverse first, and use rolling sum
res = pd.rolling_sum(col[::-1], window=n_days).shift(1)[::-1]
return res[col < threshold].reindex(col.index)
# specify your parameter here, next 3 days, -0.1
apply_func = partial(sum_over_next_n_days, n_days=3, threshold=-0.1)
df.apply(apply_func)
A B
2010-01-01 NaN NaN
2010-01-04 NaN NaN
2010-01-05 NaN NaN
2010-01-06 NaN NaN
2010-01-07 NaN NaN
2010-01-08 NaN NaN
2010-01-11 NaN NaN
2010-01-12 NaN NaN
2010-01-13 NaN NaN
2010-01-14 NaN NaN
2010-01-15 0.3180 NaN
2010-01-18 NaN NaN
2010-01-19 NaN 0.1660
2010-01-20 NaN NaN
2010-01-21 NaN NaN
2010-01-22 NaN NaN
2010-01-25 NaN 0.1056
2010-01-26 NaN NaN
2010-01-27 NaN NaN
2010-01-28 NaN NaN
2010-01-29 -0.3469 0.2290
2010-02-01 -0.3376 NaN
2010-02-02 NaN NaN
2010-02-03 NaN NaN
2010-02-04 NaN NaN
将熊猫作为pd导入
将numpy作为np导入
#样本数据
# =================================
np.random.seed(0)
df=pd.DataFrame(np.random.randn(25,2)*0.1,列=['A','B',索引=pd.date_范围('2010-01-01',句点=25,频率=B'))
df
A B
2010-01-01 0.1764 0.0400
2010-01-04 0.0979 0.2241
2010-01-05 0.1868 -0.0977
2010-01-06 0.0950 -0.0151
2010-01-07 -0.0103 0.0411
2010-01-08 0.0144 0.1454
2010-01-11 0.0761 0.0122
2010-01-12 0.0444 0.0334
2010-01-13 0.1494 -0.0205
2010-01-14 0.0313 -0.0854
2010-01-15 -0.2553 0.0654
2010-01-18 0.0864 -0.0742
2010-01-19 0.2270 -0.1454
2010-01-20 0.0046 -0.0187
2010-01-21 0.1533 0.1469
2010-01-22 0.0155 0.0378
2010-01-25 -0.0888 -0.1981
2010-01-26 -0.0348 0.0156
2010-01-27 0.1230 0.1202
2010-01-28 -0.0387 -0.0302
2010-01-29 -0.1049 -0.1420
2010-02-01 -0.1706 0.1951
2010-02-02 -0.0510 -0.0438
2010-02-03 -0.1253 0.0777
2010-02-04 -0.1614 -0.0213
#加工
# =================================================
从functools导入部分
定义未来天数的总和(col,n天数,阈值):
#在接下来的几天里求和,先倒过来,然后使用滚动求和
res=pd.滚动总和(列[:-1],窗口=n天)。班次(1)[:-1]
返回res[col
通过查找满足条件的所有行的整数索引,尝试构建索引列表,如中所示:
import numpy as np
from pandas import DataFrame
df = DataFrame([["2015-06-23",-0.010959],
["2015-06-24", -0.102770],
["2015-06-25", 0.005556],
["2015-06-26", 0.008287],
["2015-06-29", -0.113699],
["2015-06-30", 0.005556],
["2015-07-01", 0.006077],
["2015-07-02", -0.010983]], columns=['Date','Ticker'])
df.set_index(['Date'])
index_list = list(np.where(df.Ticker < -0.1)[0])
是否要查找值小于某个级别的第一行,还是要查找满足此条件的所有行?我要查找所有行。因此,如果2015-01-01小于-0.10,我想对接下来的60天求和。那么,如果2015-02-01是
import numpy as np
from pandas import DataFrame
df = DataFrame([["2015-06-23",-0.010959],
["2015-06-24", -0.102770],
["2015-06-25", 0.005556],
["2015-06-26", 0.008287],
["2015-06-29", -0.113699],
["2015-06-30", 0.005556],
["2015-07-01", 0.006077],
["2015-07-02", -0.010983]], columns=['Date','Ticker'])
df.set_index(['Date'])
index_list = list(np.where(df.Ticker < -0.1)[0])
list = []
for i in index_list:
list.append(df.loc[i:i+30-1]['Ticker'].sum())