Python 计算单词在熊猫中出现的最快方法
我有一个字符串列表。我想计算Pandas列每行中所有单词的出现次数,并使用此计数添加一个新列Python 计算单词在熊猫中出现的最快方法,python,string,count,Python,String,Count,我有一个字符串列表。我想计算Pandas列每行中所有单词的出现次数,并使用此计数添加一个新列 words = ["I", "want", "please"] data = pd.DataFrame({"col" : ["I want to find", "the fastest way", "to count occurrence", "of words in a column", "Can you help please"]}) d
words = ["I", "want", "please"]
data = pd.DataFrame({"col" : ["I want to find", "the fastest way", "to
count occurrence", "of words in a column", "Can you help please"]})
data["Count"] = data.col.str.count("|".join(words))
print(data)
这里显示的代码正是我想要的,但是运行一个长文本和一长串单词需要很长时间。你能建议一个更快的方法来做同样的事情吗
谢谢也许您可以使用
计数器
。如果要针对同一文本测试多组单词
,只需在应用计数器
后保存中间步骤即可。由于这些已计数的单词现在位于键入该单词的词典中,因此测试该词典是否包含给定单词是一个O(1)操作
from collections import Counter
data["Count"] = (
data['col'].str.split()
.apply(Counter)
.apply(lambda counts: sum(word in counts for word in words))
)
>>> data
col Count
0 I want to find 2
1 the fastest way 0
2 to count occurrence 0
3 of words in a column 0
4 Can you help please 1
我测试了你的解决方案,时间除以4。谢谢