Pandas 面板数据中的时间加权平均分组比
嗨,我有一个面板数据集看起来像Pandas 面板数据中的时间加权平均分组比,pandas,pandas-groupby,weighted-average,Pandas,Pandas Groupby,Weighted Average,嗨,我有一个面板数据集看起来像 stock date time spread1 weight spread2 VOD 01-01 9:05 0.01 0.03 ... VOD 01-01 9.12 0.03 0.05 ... VOD 01-01 10.04 0.02 0.30 ... VOD 01-02 11.04 0.02 0.05 ...
stock date time spread1 weight spread2
VOD 01-01 9:05 0.01 0.03 ...
VOD 01-01 9.12 0.03 0.05 ...
VOD 01-01 10.04 0.02 0.30 ...
VOD 01-02 11.04 0.02 0.05
... ... ... .... ...
BAT 01-01 0.05 0.04 0.03
BAT 01-01 0.07 0.05 0.03
BAT 01-01 0.10 0.06 0.04
我想计算每天每只股票的spread1
加权平均数。我可以将解决方案分解为几个步骤。i、 e.我可以应用groupby
和agg
函数来获得dataframe1中每天每只股票的spread1*权重之和,然后在dataframe2中计算每天每只股票的权重之和。在这之后,合并两个数据集,得到spread1的加权平均值
我的问题是这里有没有简单的方法来计算spread1的加权平均值?我还有spread2、spread3和spread4。所以我想写尽可能少的代码。感谢IIUC,您需要将结果转换回原始结果,但是使用.transform
和依赖于两列的输出是很棘手的。我们编写自己的函数,在这里我们传递一系列排列s
和原始数据帧df
,因此我们也可以使用权重:
import numpy as np
def weighted_avg(s, df):
return np.average(s, weights=df.loc[df.index.isin(s.index), 'weight'])
df['spread1_avg'] = df.groupby(['stock', 'date']).spread1.transform(weighted_avg, df)
输出:
如果需要多个列:
gp = df.groupby(['stock', 'date'])
for col in [f'spread{i}' for i in range(1,5)]:
df[f'{col}_avg'] = gp[col].transform(weighted_avg, df)
或者,如果您不需要转换回,并且希望每个股票日期有一个值:
def my_avg2(gp):
avg = np.average(gp.filter(like='spread'), weights=gp.weight, axis=0)
return pd.Series(avg, index=[col for col in gp.columns if col.startswith('spread')])
### Create some dummy data
df['spread2'] = df.spread1+1
df['spread3'] = df.spread1+12.1
df['spread4'] = df.spread1+1.13
df.groupby(['stock', 'date'])[['weight'] + [f'spread{i}' for i in range(1,5)]].apply(my_avg2)
# spread1 spread2 spread3 spread4
#stock date
#BAT 01-01 0.051000 1.051000 12.151000 1.181000
#VOD 01-01 0.020526 1.020526 12.120526 1.150526
# 01-02 0.020000 1.020000 12.120000 1.150000
谢谢你@ALollz
def my_avg2(gp):
avg = np.average(gp.filter(like='spread'), weights=gp.weight, axis=0)
return pd.Series(avg, index=[col for col in gp.columns if col.startswith('spread')])
### Create some dummy data
df['spread2'] = df.spread1+1
df['spread3'] = df.spread1+12.1
df['spread4'] = df.spread1+1.13
df.groupby(['stock', 'date'])[['weight'] + [f'spread{i}' for i in range(1,5)]].apply(my_avg2)
# spread1 spread2 spread3 spread4
#stock date
#BAT 01-01 0.051000 1.051000 12.151000 1.181000
#VOD 01-01 0.020526 1.020526 12.120526 1.150526
# 01-02 0.020000 1.020000 12.120000 1.150000