Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/vba/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 统计列表中单词的频率并删除不受欢迎的单词_Python - Fatal编程技术网

Python 统计列表中单词的频率并删除不受欢迎的单词

Python 统计列表中单词的频率并删除不受欢迎的单词,python,Python,我的数据在列表中 data = [['Biz_Innovations', '#socialmedia'], ['ChantalGrange', '#aws'], ['beyonddevops', '#aws'], ['beyonddevops', '#socialmedia'], ['IBMNetezza', '#ibm'], ['IBMNetezza', '#analytics'], ['SandraFeinsmith', '#ibm'], ['SandraFeinsmith',

我的数据在列表中

data = [['Biz_Innovations', '#socialmedia'],
 ['ChantalGrange', '#aws'],
 ['beyonddevops', '#aws'],
 ['beyonddevops', '#socialmedia'],
 ['IBMNetezza', '#ibm'],
 ['IBMNetezza', '#analytics'],
 ['SandraFeinsmith', '#ibm'],
 ['SandraFeinsmith', '#analytics'],
 ['fleejack', '#healhcare'],
 ['bigdataweek', '#socialmedia'],
 ['sabumjung', '#aws']]
我想计算第二列中单词的频率(例如,#socialmedia,#aws),然后根据该频率选择行。如果单词在数据集中出现三次或更多次,我希望保留相应的行(其他行被删除)。因此,结果如下所示:

data = [['Biz_Innovations', '#socialmedia'],
 ['ChantalGrange', '#aws'],
 ['beyonddevops', '#aws'],
 ['beyonddevops', '#socialmedia'],
 ['bigdataweek', '#socialmedia'],
 ['sabumjung', '#aws']]

有什么建议吗?

您可以使用
集合。计数器
用于此:

In [16]: from collections import Counter

In [17]: keepers = [a[0] for a in Counter(d[1] for d in data).items() if a[1]>=3]

In [18]: [d for d in data if d[1] in keepers]
Out[18]: 
[['Biz_Innovations', '#socialmedia'],
 ['ChantalGrange', '#aws'],
 ['beyonddevops', '#aws'],
 ['beyonddevops', '#socialmedia'],
 ['bigdataweek', '#socialmedia'],
 ['sabumjung', '#aws']]
import collections
counts = collections.Counter(tag for (_, tag) in data)
data = [[val, tag] for (val, tag) in data if counts[tag] >= 3]
collections.Counter(map(operator.itemgetter(1),data))
将为您提供很多帮助。
>>> import collections, operator
>>> words = collections.Counter(map(operator.itemgetter(1), data))
>>> populars = [p for p in data if words[p[1]] >= 3]