Python 如何计算数据帧中所有与条件相等的选定字？_Python_String_Python 3.x_Pandas_Counter

Python 如何计算数据帧中所有与条件相等的选定字？

python string python-3.x pandas

Python 如何计算数据帧中所有与条件相等的选定字？,python,string,python-3.x,pandas,counter,Python,String,Python 3.x,Pandas,Counter,我有一个have数据框，我想计算整个数据框中特定列中的单词假设shape是数据框中的一列： shape color circle rectangle orange square triangle rombus square oval black triangle circle rectangle oval wh

我有一个have数据框，我想计算整个数据框中特定列中的单词

假设

shape

是数据框中的一列：

shape                             color
circle rectangle                  orange
square triangle 
rombus  



square oval                       black
triangle circle

rectangle oval                    white
triangle

我想在

shape

列中计算数据框中有多少个圆、矩形、椭圆形和三角形

输出应为：

circle    2
rectangle 2
triangle  3
oval      1

使用：

说明：

首先使用空格（默认分隔符）和单词的

系列

按

列表中的值过滤


计数用
如有必要，使用0
add更改顺序或添加缺少的值
对于系列中的数据帧
添加

您可以使用空格连接的'shape'
列，并拆分结果。将其传递给顶级函数pandas.value\u counts
，并使用reindex
将其子集化为您想要查看的形状
reindex
的优点是，如果'shape'
列中不存在所需的形状之一，则返回nan

shapes = ['circle','rectangle','oval','triangle']
pd.value_counts(' '.join(df['shape']).split()).reindex(shapes)

circle       2
rectangle    2
oval         2
triangle     3
dtype: int64


如果预计数据集中可能缺少形状，还可以提供reindex
填充值。下面，我选择用0
填充它
shapes = ['circle','rectangle','oval','triangle', 'dodecagon']
pd.value_counts(' '.join(df['shape']).split()).reindex(shapes, fill_value=0)

circle       2
rectangle    2
oval         2
triangle     3
dodecagon    0
dtype: int64

拆分字符串后，可以将collections.Counter
与itertools.chain
一起使用：
df = pd.DataFrame({'shape': ['circle rectangle', 'square triangle',
                             'rombus', 'square oval', 'triangle circle',
                             'rectangle oval', 'triangle']})

from collections import Counter
from itertools import chain

c = Counter(chain.from_iterable(df['shape'].str.split()))

print(c)

Counter({'triangle': 3, 'circle': 2, 'rectangle': 2,
         'square': 2, 'oval': 2, 'rombus': 1})

这将给出一个计数器
对象，它是dict
的子类。如果您希望筛选密钥，可以通过字典进行筛选：
L = {'circle', 'rectangle', 'oval', 'triangle'}

res = {k: v for k, v in c.items() if k in L}

print(res)

{'circle': 2, 'oval': 2, 'rectangle': 2, 'triangle': 3}

TypeError:列表索引必须是整数或片，而不是str@tiru-我认为需要s=df['shape'].astype（str）.str.split（expand=True）.stack（）
-将列转换为string
s@tiru-你能检查一下s=df['shape'].astype（str）.str.split（expand=True）.stack（）？为什么杰兹雷尔我把所有的计数都定为0
L = {'circle', 'rectangle', 'oval', 'triangle'}

res = {k: v for k, v in c.items() if k in L}

print(res)

{'circle': 2, 'oval': 2, 'rectangle': 2, 'triangle': 3}