Python 按另一列分组的值出现次数之和_Python_Pandas_Dataframe

Python 按另一列分组的值出现次数之和

python pandas dataframe

Python 按另一列分组的值出现次数之和,python,pandas,dataframe,Python,Pandas,Dataframe,我需要计算列name中每个值的出现次数，并按列industry分组。目标是获得每个行业每个名称的总和。我的数据如下所示： industry name Home Mike Home Mike,Angela,Elliot Fashion Angela,Elliot Fashion Angela,Elliot 所需输出为： Home Mike:2 Angela:1 Elliot:1 Fashi

我需要计算列

name

中每个值的出现次数，并按列

industry

分组。目标是获得每个行业每个名称的总和。我的数据如下所示：

industry            name
Home             Mike
Home             Mike,Angela,Elliot
Fashion          Angela,Elliot
Fashion          Angela,Elliot

所需输出为：

Home Mike:2 Angela:1 Elliot:1
Fashion Angela:2 Elliot:2

将其从注释中移出，经过调试并证明有效：

# count() in the next line won't work without an extra column
df['name_list'] = df['name'].str.split(',')
df.explode('name_list').groupby(['industry', 'name_list']).count()

结果:

                    name
industry name_list      
Fashion  Angela        2
         Elliot        2
Home     Angela        1
         Elliot        1
         Mike          2

您可以使用

collections.Counter

返回一系列字典，如下所示：

from collections import Counter
s = df.name.str.split(',').groupby(df.industry).sum().agg(Counter)

Out[506]:
industry
Fashion               {'Angela': 2, 'Elliot': 2}
Home       {'Mike': 2, 'Angela': 1, 'Elliot': 1}
Name: name, dtype: object

注意：每个单元格都是一个

计数器

对象

Counter

是dictionary的子类，因此您可以将dictionary操作作为dictionary应用于它。

df['name']=df['name'].str.split（'，'）；df.explode（'name'）.groupby（['industry'，'name']，as_index=False）.count（）

@Marat为什么不将其作为答案发布？@sushanth它太琐碎了（问题和答案都是）。另外，我不想完全调试它，现在它更像是一个方向at@Marat我同意你的看法，但总的来说，这是一个很好的答案。