Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 具有groupby条件的新列在dataframe中不工作_Python 3.x_Pandas_Pandas Groupby - Fatal编程技术网

Python 3.x 具有groupby条件的新列在dataframe中不工作

Python 3.x 具有groupby条件的新列在dataframe中不工作,python-3.x,pandas,pandas-groupby,Python 3.x,Pandas,Pandas Groupby,我有一个如下所示的数据帧: df= ['user_id','session_id','purchase'] [1,34,'yes'] [1,35,'no'] [2,36,'no'] df= ['user_id','session_id','purchase',purchase_yes','purchase_no'] [1,34,'yes',1,1] [1,35,'no' ,1,1] [2,36,'no' ,0,1] 现在,我想创建两个新列,汇总每个用户的所有购买。请注意,对于同一用户,它应该

我有一个如下所示的数据帧:

df=
['user_id','session_id','purchase']
[1,34,'yes']
[1,35,'no']
[2,36,'no']
df=
['user_id','session_id','purchase',purchase_yes','purchase_no']
[1,34,'yes',1,1]
[1,35,'no' ,1,1]
[2,36,'no' ,0,1]
现在,我想创建两个新列,汇总每个用户的所有购买。请注意,对于同一用户,它应该在这些新列中粘贴相同的值,如下所示:

df=
['user_id','session_id','purchase']
[1,34,'yes']
[1,35,'no']
[2,36,'no']
df=
['user_id','session_id','purchase',purchase_yes','purchase_no']
[1,34,'yes',1,1]
[1,35,'no' ,1,1]
[2,36,'no' ,0,1]
我试过这个,但不起作用:

df['purchase_yes'] = df[df.purchase == 'yes'].groupby("user_id").purchase.sum()
它向我显示Nan值。

尝试以下操作:

new_df = df.groupby('user_id').purchase.value_counts().unstack(fill_value=0)

# you can also use either of these
# new_df = pd.crosstab(df.user_id, df.purchase)
# new_df = df.pivot_table(index='user_id', columns='purchase', aggfunc='count', fill_value=0)

# rename the columns of new data
new_df.columns = 'purchase_'+new_df.columns

# merge the new data with the old on user_id
df.merge(new_df, left_on='user_id', right_index=True)
输出:

   user_id  session_id purchase  purchase_no  purchase_yes
0        1          34      yes            1             1
1        1          35       no            1             1
2        2          36       no            1             0

您可以使用
groupby
value\u counts
获得总和:

a=df.groupby(['user_id'])['purchase'].value_counts().unstack(fill_value=0)
print(a)

purchase    no  yes
user_id     
1            1    1
2            1    0
然后使用:

输出:

   user_id  session_id purchase  purchase_yes  purchase_no
0        1          34      yes             1           1
1        1          35       no             1           1
2        2          36       no             0           1