Python 在滚动和函数中取上一个分组的最后一个值?熊猫蟒蛇
我正在尝试编写一个函数,该函数将根据滚动窗口上的特定索引求和/平均 我的数据如下所示:Python 在滚动和函数中取上一个分组的最后一个值?熊猫蟒蛇,python,datetime,pandas,dataframe,Python,Datetime,Pandas,Dataframe,我正在尝试编写一个函数,该函数将根据滚动窗口上的特定索引求和/平均 我的数据如下所示: Date (L0) Date - (L1) Value 4-Period-L0-Sum 12/31/2011 1/25/2012 1321 3/31/2012 4/25/2012 1457 6/30/2012 7/25/2012 2056 9/30/2012 10/26/2012 3461 8295 12/31/2012 1/24/2013
Date (L0) Date - (L1) Value 4-Period-L0-Sum
12/31/2011 1/25/2012 1321
3/31/2012 4/25/2012 1457
6/30/2012 7/25/2012 2056
9/30/2012 10/26/2012 3461 8295
12/31/2012 1/24/2013 2317 9291
3/31/2013 4/24/2013 2008 9842
6/30/2013 7/24/2013 1885 9671
6/30/2013 7/27/2013 1600 9386
9/30/2013 10/29/2013 1955 7880
9/30/2013 11/1/2013 1400 7325
12/31/2013 1/28/2014 1985 6993
12/31/2013 1/30/2014 1985 6993
3/31/2014 4/24/2014 1382 6367
3/31/2014 4/25/2014 1200 6185
6/30/2014 7/23/2014 2378 6963
9/30/2014 10/21/2014 3826 9389
3/31/2015 4/28/2015 2369 9773
3/31/2015 4/30/2015 2369 9773
import pandas as pd
text = """DateL1 DateL2 Value Sum
12/31/2011 1/25/2012 1321
3/31/2012 4/25/2012 1457
6/30/2012 7/25/2012 2056
9/30/2012 10/26/2012 3461 8295
12/31/2012 1/24/2013 2317 9291
3/31/2013 4/24/2013 2008 9842
6/30/2013 7/24/2013 1885 9671
6/30/2013 7/27/2013 1600 9386
9/30/2013 10/29/2013 1955 7880
9/30/2013 11/1/2013 1400 7325
12/31/2013 1/28/2014 1985 6993
12/31/2013 1/30/2014 1985 6993
3/31/2014 4/24/2014 1382 6367
3/31/2014 4/25/2014 1200 6185
6/30/2014 7/23/2014 2378 6963
9/30/2014 10/21/2014 3826 9389
3/31/2015 4/28/2015 2369 9773
3/31/2015 4/30/2015 2369 9773"""
from io import BytesIO
df = pd.read_csv(BytesIO(text), delim_whitespace=True, parse_dates=[0], index_col=0)
s1 = pd.rolling_sum(df.groupby(df.index, sort=False).Value.last(), 4)
def f(s):
return s - s.iat[-1]
s2 = df.groupby(df.index, sort=False).Value.transform(f).fillna(0)
print s1 + s2
我试图生成类似pd.rolling_sum(dataframe,window=4)的值,除了根据level=0索引(Date(L0))和使用前面level=0索引项中的最后一个值之外。例如,要计算该期间的滚动总和
[3/31/2014 4/24/2014] = 1382 + 1985 + 1400 + 1600
我的解决方案是使用一个扩展窗口groupby level 0,然后获取尾部和总和:
def custom_sum(datadf, period):
idx_range = np.arange(n)
mm = period * 2 + 4
tmpdf = pd.concat(
map(lambda i:
pd.DataFrame( datadf.iloc[ :i], ].
groupby(level=0,axis=0).tail(1).tail(period).
sum(skipna=False)
).T
, idx_range[period:] ))
tmpdf.index = datadf.index[period-1:]
return tmpdf
虽然它跑得很慢。我相信一定有更好的办法
一种方法可能是使用pd.exanding_apply(),但它不保留用于应用函数的数据帧,因此无法获得正确的GroupBy索引
谢谢 您可以按如下方式使用groupby:
Date (L0) Date - (L1) Value 4-Period-L0-Sum
12/31/2011 1/25/2012 1321
3/31/2012 4/25/2012 1457
6/30/2012 7/25/2012 2056
9/30/2012 10/26/2012 3461 8295
12/31/2012 1/24/2013 2317 9291
3/31/2013 4/24/2013 2008 9842
6/30/2013 7/24/2013 1885 9671
6/30/2013 7/27/2013 1600 9386
9/30/2013 10/29/2013 1955 7880
9/30/2013 11/1/2013 1400 7325
12/31/2013 1/28/2014 1985 6993
12/31/2013 1/30/2014 1985 6993
3/31/2014 4/24/2014 1382 6367
3/31/2014 4/25/2014 1200 6185
6/30/2014 7/23/2014 2378 6963
9/30/2014 10/21/2014 3826 9389
3/31/2015 4/28/2015 2369 9773
3/31/2015 4/30/2015 2369 9773
import pandas as pd
text = """DateL1 DateL2 Value Sum
12/31/2011 1/25/2012 1321
3/31/2012 4/25/2012 1457
6/30/2012 7/25/2012 2056
9/30/2012 10/26/2012 3461 8295
12/31/2012 1/24/2013 2317 9291
3/31/2013 4/24/2013 2008 9842
6/30/2013 7/24/2013 1885 9671
6/30/2013 7/27/2013 1600 9386
9/30/2013 10/29/2013 1955 7880
9/30/2013 11/1/2013 1400 7325
12/31/2013 1/28/2014 1985 6993
12/31/2013 1/30/2014 1985 6993
3/31/2014 4/24/2014 1382 6367
3/31/2014 4/25/2014 1200 6185
6/30/2014 7/23/2014 2378 6963
9/30/2014 10/21/2014 3826 9389
3/31/2015 4/28/2015 2369 9773
3/31/2015 4/30/2015 2369 9773"""
from io import BytesIO
df = pd.read_csv(BytesIO(text), delim_whitespace=True, parse_dates=[0], index_col=0)
s1 = pd.rolling_sum(df.groupby(df.index, sort=False).Value.last(), 4)
def f(s):
return s - s.iat[-1]
s2 = df.groupby(df.index, sort=False).Value.transform(f).fillna(0)
print s1 + s2
以下是输出:
DateL1
2011-12-31 NaN
2012-03-31 NaN
2012-06-30 NaN
2012-09-30 8295
2012-12-31 9291
2013-03-31 9842
2013-06-30 9671
2013-06-30 9386
2013-09-30 7880
2013-09-30 7325
2013-12-31 6993
2013-12-31 6993
2014-03-31 6367
2014-03-31 6185
2014-06-30 6963
2014-09-30 9389
2015-03-31 9773
2015-03-31 9773
dtype: float64
正在检查这是否有效。。最初我有一个多索引,其中level=0是DateL1,level=1是DateL2。。