在Python中计算列表中最常见的项_Python_List_Collections

在Python中计算列表中最常见的项

python list collections

在Python中计算列表中最常见的项,python,list,collections,Python,List,Collections,我试图显示列表中最常见的n个项，但得到的错误是：TypeError:unhabable type:'list' import collections test = [[u'the\xa0official', u'MySQL'], [u'MySQL', u'repos'], [u'repos', u'for'], [u'for', u'Linux'], [u'Linux', u'a'], [u'a', u'little'], [u'little', u'over'], [u'over', u'a

我试图显示列表中最常见的n个项，但得到的错误是：TypeError:unhabable type:'list'

import collections

test = [[u'the\xa0official', u'MySQL'], [u'MySQL', u'repos'], [u'repos', u'for'], [u'for', u'Linux'], [u'Linux', u'a'], [u'a', u'little'], [u'little', u'over'], [u'over', u'a'], [u'a', u'year'], [u'year', u'ago,'], [u'ago,', u'the'], [u'the', u'offering'], [u'offering', u'has'], [u'has', u'grown'], [u'grown', u'steadily.\xa0Starting'], [u'steadily.\xa0Starting', u'off'], [u'off', u'with'], [u'with', u'support'], [u'support', u'for'], [u'for', u'the'], [u'the', u'Yum'], [u'Yum', u'based'], [u'based', u'family'], [u'family', u'of\xa0Red'], [u'of\xa0Red', u'Hat/Fedora/Oracle'], [u'Hat/Fedora/Oracle', u'Linux,'], [u'Linux,', u'we'], [u'we', u'added'], [u'added', u'Apt'], [u'Apt', u'repos'], [u'repos', u'for'], [u'for', u'Debian'], [u'Debian', u'and'], [u'and', u'Ubuntu'], [u'Ubuntu', u'in'], [u'in', u'late'], [u'late', u'spring,'], [u'spring,', u'and'], [u'and', u'throughout'], [u'throughout', u'all']]

print test[0]
print type(test)

print collections.Counter(test).most_common(3)

您需要将内部列表更改为

tuple

，以便它们可以散列

>>> from collections import Counter
>>> c = Counter(tuple(i) for i in test)
>>> c.most_common(3)
[(('repos', 'for'), 2),
 (('Hat/Fedora/Oracle', 'Linux,'), 1),
 (('year', 'ago,'), 1)]

集合。计数器基于字典。因此，您的密钥需要是可散列的，而列表是不可散列的
如果要计算单个字符串，则可以使用生成器表达式从每个列表中提取元素，如下所示：
c = collections.Counter(word for pair in test for word in pair)

如果您想计算对，例如2-gram，那么您需要将每个内部列表转换为一个元组（可以散列），然后传递该元组，这同样可以使用生成器表达式来完成
c2 = collections.Counter(tuple(pair) for pair in test)

正如错误所说，list
是不可散列的。另一种规避此问题的方法是通过字符串：使用分隔符连接列表（空格似乎是一个不错的选择），然后再次进行计数和拆分：
>>> [(i.split(' '),j) for i,j in collections.Counter(' '.join(i) for i in test).most_common(3)]
[([u'repos', u'for'], 2), ([u'grown', u'steadily.\xa0Starting'], 1), ([u'Linux', u'a'], 1)]

是否要对每个字符串进行计数？或者你想把它们看成是对的吗？期望的输出是什么？错误信息是清楚的-列表不是可哈希的，所以你不能把它们放在<代码>计数器< /代码>（这是基于字典）。如果你想把它们放在一起，试试：<代码>计数器（map（tuple，test））< /> >。我想把它们看作是对。它们是2克，所以我想展示最常见的一对。嗨，JonSharpe:这很好用。如果你加上这个作为答案，我可以选择它作为我问题的解决方案。
>>> [(i.split(' '),j) for i,j in collections.Counter(' '.join(i) for i in test).most_common(3)]
[([u'repos', u'for'], 2), ([u'grown', u'steadily.\xa0Starting'], 1), ([u'Linux', u'a'], 1)]