Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas groupby-对用户进行分组并计算订阅类型_Python_Pandas_Group By_Pivot Table_Reshape - Fatal编程技术网

Python Pandas groupby-对用户进行分组并计算订阅类型

Python Pandas groupby-对用户进行分组并计算订阅类型,python,pandas,group-by,pivot-table,reshape,Python,Pandas,Group By,Pivot Table,Reshape,我尝试使用pandas对成员进行分组,计算成员购买的订阅类型数量,并获得每个成员的总花费。加载后,数据类似于: df = Member Nbr Member Name-First Member Name-Last Date-Joined Member Type Amount Addr-Formatted Date-Birth Gender Status 1 Abou

我尝试使用pandas对成员进行分组,计算成员购买的订阅类型数量,并获得每个成员的总花费。加载后,数据类似于:

df = 

Member Nbr  Member Name-First   Member Name-Last        Date-Joined             Member Type         Amount  Addr-Formatted  Date-Birth              Gender      Status    
1           Aboud               Tordon                  2010-03-31 00:00:00     1 Year Membership   331.00  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2011-04-16 00:00:00     1 Year Membership   334.70  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2012-08-06 00:00:00     1 Year Membership   344.34  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2013-08-21 00:00:00     1 Year Membership   362.53  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2015-08-31 00:00:00     1 Year Membership   289.47  ADDRESS_1       1972-08-01 00:00:00     Male        Active  

2          Jean                 Manuel                  2012-12-10 00:00:00     4 Month Membership  148.79  ADDRESS_2       1984-08-01 00:00:00     Male        In-Active   
2          Jean                 Manuel                  2013-03-13 00:00:00     1 Year Membership   348.46  ADDRESS_2       1984-08-01 00:00:00     Male        In-Active
2          Jean                 Manuel                  2014-03-15 00:00:00     1 Year Membership   316.86  ADDRESS_2       1984-08-01 00:00:00     Male        In-Active   

3          Val                  Adams                   2010-02-09 00:00:00     1 Year Membership   333.25  ADDRESS_3       1934-10-26 00:00:00     Female      Active  
3          Val                  Adams                   2011-03-09 00:00:00     1 Year Membership   333.88  ADDRESS_3       1934-10-26 00:00:00     Female      Active
3          Val                  Adams                   2012-04-03 00:00:00     1 Year Membership   318.34  ADDRESS_3       1934-10-26 00:00:00     Female      Active
3          Val                  Adams                   2013-04-15 00:00:00     1 Year Membership   350.73  ADDRESS_3       1934-10-26 00:00:00     Female      Active  
3          Val                  Adams                   2014-04-19 00:00:00     1 Year Membership   291.63  ADDRESS_3       1934-10-26 00:00:00     Female      Active  
3          Val                  Adams                   2015-04-19 00:00:00     1 Year Membership   247.35  ADDRESS_3       1934-10-26 00:00:00     Female      Active

5          Michele              Younes                  2010-02-14 00:00:00     1 Year Membership   333.25  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active   
5          Michele              Younes                  2011-05-23 00:00:00     1 Year Membership   317.77  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active   
5          Michele              Younes                  2012-05-28 00:00:00     1 Year Membership   328.16  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active   
5          Michele              Younes                  2013-05-31 00:00:00     1 Year Membership   360.02  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active

7          Adam                 Herzburg                2010-07-11 00:00:00     1 Year Membership   335 48  ADDRESS_5       1987-08-30 00:00:00     Male        In-Active
...
由于最流行的
会员类型
1个月
3个月
4个月
6个月
1年
,我想列一列,计算某个会员购买的
会员类型
的数量

还有
2个月
5个月
7个月
8个月
,以及
仅池
成员类型
出现的频率非常低,如果成员具有该类型的合同,我想将其算作“杂项”

我还试图得到一个“总计”列,该列汇总给定成员花费的总金额

基本上,我希望将我以前的数据帧转换为类似:

df1=
Member Nbr  Member Name-First   Member Name-Last    1_Month  3_Month  4_Month  6_Month  1_Year  Misc    Total    Addr-Formatted Date-Birth           Gender     Status
1           Aboud               Tordon              0        0        0        0        5       0       1662.04  ADDRESS_1      1972-08-01 00:00:00  Male       Active
2           Jean                Manuel              0        0        1        0        2       0       813.86   ADDRESS_2      1984-08-01 00:00:00  Male       In-Active
3           Val                 Adams               0        0        0        0        6       0       1875.18  ADDRESS_3      1934-10-26 00:00:00  Female     Active
5           Michele             Younes              0        0        0        0        4       0       1339.20  ADDRESS_4      1933-06-23 00:00:00  Female     In-Active
7           Adam                Herzburg            0        0        0        0        1       0       335.48   ADDRESS_5      1933-06-23 00:00:00  Male       In-Active

我遇到的问题是,每当我使用
groupby
时,我只能对金额进行汇总,或者单独计算一种特定类型的合同,但是我无法使它类似于
df1

您可以先通过dict
d
然后通过value
Misc
获得列
成员类型的值:

d = {'1 Year Membership':'1_Year','1 Month Membership':'1_Month', '3 Month Membership':'3_Month', '4 Month Membership':'4_Month', '6 Month Membership':'6_Month'}
df['Type'] = df['Member Type'].map(d).fillna('Misc')
#print (df)
然后
groupby
和aggregate
sum

df0 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status'])['Amount'].sum()
#print (df0)
将列
类型
添加到分组列和聚合列表中,然后通过以下方式重塑:

最后两个
数据帧

print (pd.concat([df0, df1], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  1_Year  4_Month  
0  1972-08-01 00:00:00    Male     Active  1662.04       5        0  
1  1984-08-01 00:00:00    Male  In-Active   814.11       2        1  
2  1934-10-26 00:00:00  Female     Active  1875.18       6        0  
3  1933-06-23 00:00:00  Female  In-Active  1339.20       4        0  
4  1987-08-30 00:00:00    Male  In-Active   335.48       1        0  
编辑:

如果列
成员类型
中缺少某些值,则需要添加:


相反,第二个
groupby
(最快的)可能是使用:

print (pd.concat([df0, df1], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  1_Year  4_Month  
0  1972-08-01 00:00:00    Male     Active  1662.04       5        0  
1  1984-08-01 00:00:00    Male  In-Active   814.11       2        1  
2  1934-10-26 00:00:00  Female     Active  1875.18       6        0  
3  1933-06-23 00:00:00  Female  In-Active  1339.20       4        0  
4  1987-08-30 00:00:00    Male  In-Active   335.48       1        0  
df1 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status', 'Type']).size().unstack(fill_value=0).reindex(columns=d.values(), fill_value=0)
#print (df1)

print (pd.concat([df0, df1], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  6_Month  3_Month  4_Month  \
0  1972-08-01 00:00:00    Male     Active  1662.04        0        0        0   
1  1984-08-01 00:00:00    Male  In-Active   814.11        0        0        1   
2  1934-10-26 00:00:00  Female     Active  1875.18        0        0        0   
3  1933-06-23 00:00:00  Female  In-Active  1339.20        0        0        0   
4  1987-08-30 00:00:00    Male  In-Active   335.48        0        0        0   

   1_Year  1_Month  
0       5        0  
1       2        0  
2       6        0  
3       4        0  
4       1        0  
df2 = df.pivot_table(index=['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status'], columns='Type', values='Amount', aggfunc=len, fill_value=0).reindex(columns=d.values(), fill_value=0)
print (pd.concat([df0, df2], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  6_Month  3_Month  4_Month  \
0  1972-08-01 00:00:00    Male     Active  1662.04        0        0        0   
1  1984-08-01 00:00:00    Male  In-Active   814.11        0        0        1   
2  1934-10-26 00:00:00  Female     Active  1875.18        0        0        0   
3  1933-06-23 00:00:00  Female  In-Active  1339.20        0        0        0   
4  1987-08-30 00:00:00    Male  In-Active   335.48        0        0        0   

   1_Year  1_Month  
0       5        0  
1       2        0  
2       6        0  
3       4        0  
4       1        0