Python Pandas:按两列分组,并计算第二列中所有值的出现次数
我想使用两列对我的数据帧进行分组,一列是yearmonth(格式:16-10),另一列是number of cust。然后,如果cumstomers的数量大于6,我想创建一行,将所有行替换为numberofcust=6+和numberofcust>6的总值之和 这就是数据的样子Python Pandas:按两列分组,并计算第二列中所有值的出现次数,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我想使用两列对我的数据帧进行分组,一列是yearmonth(格式:16-10),另一列是number of cust。然后,如果cumstomers的数量大于6,我想创建一行,将所有行替换为numberofcust=6+和numberofcust>6的总值之和 这就是数据的样子 index month num ofcust count 0 10 1.0 1 1 10 2.0
index month num ofcust count
0 10 1.0 1
1 10 2.0 1
2 10 3.0 1
3 10 4.0 1
4 10 5.0 1
5 10 6.0 1
6 10 7.0 1
7 10 8.0 1
8 11 1.0 1
9 11 2.0 1
10 11 3.0 1
11 12 12.0 1
输出:
index month no of cust count
0 16-10 1.0 3
1 16-10 2.0 6
2 16-10 3.0 2
3 16-10 4.0 3
4 16-10 5.0 4
5 16-10 6+ 4
6 16-11 1.0 4
7 16-11 2.0 3
8 16-11 3.0 2
9 16-11 4.0 1
10 16-11 5.0 3
11 16-11 6+ 5
我认为您需要先替换所有值
=6
,然后替换groupby
+aggregatesum
:
s = df['num ofcust'].mask(df['num ofcust'] >=6, '6+')
#alternatively
#s = df['num ofcust'].where(df['num ofcust'] <6, '6+')
df = df.groupby(['month', s])['count'].sum().reset_index()
print (df)
month num ofcust count
0 10 1 1
1 10 2 1
2 10 3 1
3 10 4 1
4 10 5 1
5 10 6+ 3
6 11 1 1
7 11 2 1
8 11 3 1
9 12 6+ 1
另一个非常类似的解决方案是先将数据追加到列:
df.loc[df['num ofcust'] >= 6, 'num ofcust'] = '6+'
df = df.groupby(['month', 'num ofcust'], as_index=False)['count'].sum()
print (df)
month num ofcust count
0 10 1 1
1 10 2 1
2 10 3 1
3 10 4 1
4 10 5 1
5 10 6+ 3
6 11 1 1
7 11 2 1
8 11 3 1
9 12 6+ 1
是的,这个解决方案对我有效。非常感谢您的快速回复。
df.loc[df['num ofcust'] >= 6, 'num ofcust'] = '6+'
df = df.groupby(['month', 'num ofcust'], as_index=False)['count'].sum()
print (df)
month num ofcust count
0 10 1 1
1 10 2 1
2 10 3 1
3 10 4 1
4 10 5 1
5 10 6+ 3
6 11 1 1
7 11 2 1
8 11 3 1
9 12 6+ 1