Python Pandas：按两列分组，并计算第二列中所有值的出现次数_Python_Pandas_Pandas Groupby

Python Pandas：按两列分组，并计算第二列中所有值的出现次数

python pandas

Python Pandas：按两列分组，并计算第二列中所有值的出现次数,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我想使用两列对我的数据帧进行分组，一列是yearmonth（格式：16-10），另一列是number of cust。然后，如果cumstomers的数量大于6，我想创建一行，将所有行替换为numberofcust=6+和numberofcust>6的总值之和这就是数据的样子 index month num ofcust count 0 10 1.0 1 1 10 2.0

我想使用两列对我的数据帧进行分组，一列是yearmonth（格式：16-10），另一列是number of cust。然后，如果cumstomers的数量大于6，我想创建一行，将所有行替换为numberofcust=6+和numberofcust>6的总值之和

这就是数据的样子

index     month      num ofcust    count

0            10          1.0         1
1            10          2.0         1
2            10          3.0         1
3            10          4.0         1
4            10          5.0         1
5            10          6.0         1
6            10          7.0         1
7            10          8.0         1
8            11          1.0         1
9            11          2.0         1
10           11          3.0         1
11           12          12.0        1

输出：

index   month   no of cust  count

0       16-10   1.0         3
1       16-10   2.0         6
2       16-10   3.0         2
3       16-10   4.0         3
4       16-10   5.0         4
5       16-10   6+          4
6       16-11   1.0         4
7       16-11   2.0         3
8       16-11   3.0         2
9       16-11   4.0         1
10      16-11   5.0         3
11      16-11   6+          5

我认为您需要先替换所有值

=6

，然后替换

groupby

+aggregate

sum

：

s = df['num ofcust'].mask(df['num ofcust'] >=6, '6+')
#alternatively
#s = df['num ofcust'].where(df['num ofcust'] <6, '6+')
df = df.groupby(['month', s])['count'].sum().reset_index()
print (df)
   month num ofcust  count
0     10          1      1
1     10          2      1
2     10          3      1
3     10          4      1
4     10          5      1
5     10         6+      3
6     11          1      1
7     11          2      1
8     11          3      1
9     12         6+      1

另一个非常类似的解决方案是先将数据追加到列：

df.loc[df['num ofcust'] >= 6, 'num ofcust'] = '6+'
df = df.groupby(['month', 'num ofcust'], as_index=False)['count'].sum()
print (df)
   month num ofcust  count
0     10          1      1
1     10          2      1
2     10          3      1
3     10          4      1
4     10          5      1
5     10         6+      3
6     11          1      1
7     11          2      1
8     11          3      1
9     12         6+      1

是的，这个解决方案对我有效。非常感谢您的快速回复。

df.loc[df['num ofcust'] >= 6, 'num ofcust'] = '6+'
df = df.groupby(['month', 'num ofcust'], as_index=False)['count'].sum()
print (df)
   month num ofcust  count
0     10          1      1
1     10          2      1
2     10          3      1
3     10          4      1
4     10          5      1
5     10         6+      3
6     11          1      1
7     11          2      1
8     11          3      1
9     12         6+      1