Python 如何从数据帧中提取和求和唯一单词
考虑以下数据帧:Python 如何从数据帧中提取和求和唯一单词,python,string,pandas,unique,Python,String,Pandas,Unique,考虑以下数据帧: df = pd.DataFrame({'animals': [['dog','cat','snake','lion','tiger'], ['dog','moose','alligator','lion','tiger'], ['eagle','moose','alligator','lion','tiger'], ['cat','alligator','lion']
df = pd.DataFrame({'animals': [['dog','cat','snake','lion','tiger'],
['dog','moose','alligator','lion','tiger'],
['eagle','moose','alligator','lion','tiger'],
['cat','alligator','lion']]})
我需要提取每一种独特的动物,并计算发生的次数。
输出应该类似于:
dog 2
cat 2
snake 1
lion 4
tiger 3
moose 2
alligator 3
eagle 1
类似于df.value_counts()的功能
非常感谢。您可以使用
分解
和值\u计数
:
df.animals.explode().value_counts()
输出:
lion 4
tiger 3
alligator 3
moose 2
cat 2
dog 2
eagle 1
snake 1
Name: animals, dtype: int64
带
计数器的单向
+链
import pandas as pd
from collections import Counter
from itertools import chain
pd.Series(Counter(chain.from_iterable(df['animals'])))
dog 2
cat 2
snake 1
lion 4
tiger 3
moose 2
alligator 3
eagle 1
dtype: int64
map/reduce方法
reduce(Counter.__add__, map(Counter, df.animals))
或者,减少迭代次数
reduce(lambda a,b: Counter(a) + Counter(b), df.animals)
(请记住首先导入
reduce
:从functools导入reduce
,因为它在python3中作为内置函数被删除)。您可以这样做:
bb=[val for in_arr in df['animals'].tolist()for val in_arr]
柜台(bb)
>>> pd.Series(res)
dog 2
cat 2
snake 1
lion 4
tiger 3
moose 2
alligator 3
eagle 1
dtype: int64