按其他列值汇总行-Python/Pandas中的Countif
我有一个数据集df:按其他列值汇总行-Python/Pandas中的Countif,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个数据集df: customer action date 1049381 share 9/29/2017 1049381 level_up 10/6/2017 105460 share 9/22/2017 105460 share 9/23/2017 105668 level_up 9/8/2017 105668 share 9/8/2017 105668 level_up 9/18/2017 105668 share 9/1
customer action date
1049381 share 9/29/2017
1049381 level_up 10/6/2017
105460 share 9/22/2017
105460 share 9/23/2017
105668 level_up 9/8/2017
105668 share 9/8/2017
105668 level_up 9/18/2017
105668 share 9/18/2017
105668 share 9/20/2017
905669 share 9/25/2017
905669 level_up 9/25/2017
我想统计(总结)用户在同一天“升级”和“共享”的情况。像这样:
customer share_wth_level_up
1049381 0
105460 0
105668 2
905669 1
我从pandas
开始,但是我找不到解决方案,因为它没有为每一行提供一个汇总的df(唯一)
结果使用
复制
首先过滤df,然后我们按客户和日期分组,以检查所有多个唯一的
值
s=df[df.groupby('customer').date.apply(pd.Series.duplicated,keep=False)].groupby(['customer','date']).action.nunique()
(s[s==2]//2).sum(level=0).reindex(df.customer.unique(),fill_value=0)
Out[166]:
customer
1049381 0
105460 0
105668 2
905669 1
Name: action, dtype: int64
一种解决方案是使用
GroupBy
+nunique
并测试长度是否等于2。然后使用GroupBy
+sum
合计这些实例
df_grp = df.groupby(['customer', 'date'])['action'].nunique() == 2
res = df_grp.groupby('customer').sum().astype(int)
print(res)
customer
105460 0
105668 2
905669 1
1049381 0
Name: action, dtype: int32
df_grp = df.groupby(['customer', 'date'])['action'].nunique() == 2
res = df_grp.groupby('customer').sum().astype(int)
print(res)
customer
105460 0
105668 2
905669 1
1049381 0
Name: action, dtype: int32