for循环和在Python中添加附加列groupby dataframe
下面的代码是我原来的方式for循环和在Python中添加附加列groupby dataframe,python,pandas,loops,dataframe,for-loop,Python,Pandas,Loops,Dataframe,For Loop,下面的代码是我原来的方式 import pandas as pd data = {'id':[1001,1001,1001,1001,1001,1001,1001,1001,1002,1002,1002,1002,1002,1002,1002,1002], 'name':['Tom', 'Tom', 'Tom', 'Tom','Tom', 'Tom', 'Tom', 'Tom','Jack','Jack','Jack','Jack','Jack','Jack','Jack','Jack'
import pandas as pd
data = {'id':[1001,1001,1001,1001,1001,1001,1001,1001,1002,1002,1002,1002,1002,1002,1002,1002],
'name':['Tom', 'Tom', 'Tom', 'Tom','Tom', 'Tom', 'Tom', 'Tom','Jack','Jack','Jack','Jack','Jack','Jack','Jack','Jack'],
'team':['A','A', 'B', 'B', 'C','C', 'D', 'D','A','A', 'B', 'B', 'C','C', 'D', 'D',],
'year':[2011,2011,2012,2012,2013,2013,2014,2014,2011,2011,2012,2012,2013,2013,2014,2014],
'avg':[0.500,0.400,0.300,0.200,0.100,0.200,0.300,0.400,0.500,0.400,0.300,0.200,0.100,0.200,0.300,0.400]}
df = pd.DataFrame(data)
print (df)
team_names = [c for c in df['team'].value_counts().index]
team_names
for i in team_names:
df[i+'_vs_avg_2011'] = df.loc[(df['team']==i)&(df['year']==2011)].groupby(['id','name'])['avg'].transform('mean')
df[i+'_vs_avg_2012'] = df.loc[(df['team']==i)&(df['year']==2012)].groupby(['id','name'])['avg'].transform('mean')
df[i+'_vs_avg_2013'] = df.loc[(df['team']==i)&(df['year']==2013)].groupby(['id','name'])['avg'].transform('mean')
df[i+'_vs_avg_2014'] = df.loc[(df['team']==i)&(df['year']==2014)].groupby(['id','name'])['avg'].transform('mean')
print(i)
对于循环部分
我试过了
ValueError:要解压缩的值太多(应为2个)
有没有办法简化或修复此代码?我认为您可以在多索引中使用带展平列的instaed循环,然后使用原始数据帧
:
df1 = df.pivot_table(index=['id','name'],columns=['team','year'],values='avg', aggfunc='mean')
df1.columns = [f'{a}_vs_avg_{b}' for a, b in df1.columns]
print (df1)
A_vs_avg_2011 B_vs_avg_2012 C_vs_avg_2013 D_vs_avg_2014
id name
1001 Tom 0.45 0.25 0.15 0.35
1002 Jack 0.45 0.25 0.15 0.35
df = df.join(df1, on=['id','name'])
print (df)
从_到
的年数是多少?抱歉,我刚刚添加了您的预期输出是什么?对于加入
数据帧,我得到了一个错误列重叠但没有指定后缀:索引(['A_vs_avg_2011'、'B_vs_avg_2012'、'D_vs_avg_2014'、'C_vs_avg_2013'],dtype='object')
@Yusufsn起初我也有错误,但是现在我可以得到正确的结果了。
df1 = df.pivot_table(index=['id','name'],columns=['team','year'],values='avg', aggfunc='mean')
df1.columns = [f'{a}_vs_avg_{b}' for a, b in df1.columns]
print (df1)
A_vs_avg_2011 B_vs_avg_2012 C_vs_avg_2013 D_vs_avg_2014
id name
1001 Tom 0.45 0.25 0.15 0.35
1002 Jack 0.45 0.25 0.15 0.35
df = df.join(df1, on=['id','name'])
print (df)