Python 计算单词在熊猫中出现的最快方法_Python_String_Count

Python 计算单词在熊猫中出现的最快方法

python string

Python 计算单词在熊猫中出现的最快方法,python,string,count,Python,String,Count,我有一个字符串列表。我想计算Pandas列每行中所有单词的出现次数，并使用此计数添加一个新列 words = ["I", "want", "please"] data = pd.DataFrame({"col" : ["I want to find", "the fastest way", "to count occurrence", "of words in a column", "Can you help please"]}) d

我有一个字符串列表。我想计算Pandas列每行中所有单词的出现次数，并使用此计数添加一个新列

words = ["I", "want", "please"]
data = pd.DataFrame({"col" : ["I want to find", "the fastest way", "to 
                              count occurrence", "of words in a column", "Can you help please"]})
data["Count"] = data.col.str.count("|".join(words))
print(data)

这里显示的代码正是我想要的，但是运行一个长文本和一长串单词需要很长时间。你能建议一个更快的方法来做同样的事情吗

谢谢

也许您可以使用

计数器

。如果要针对同一文本测试多组

单词

，只需在应用

计数器

后保存中间步骤即可。由于这些已计数的单词现在位于键入该单词的词典中，因此测试该词典是否包含给定单词是一个O（1）操作

from collections import Counter

data["Count"] = (
    data['col'].str.split()
    .apply(Counter)
    .apply(lambda counts: sum(word in counts for word in words))
)
>>> data
                    col  Count
0        I want to find      2
1       the fastest way      0
2   to count occurrence      0
3  of words in a column      0
4   Can you help please      1

我测试了你的解决方案，时间除以4。谢谢