python列表中最常见概率项的提取_Python_Set

python列表中最常见概率项的提取

python

python列表中最常见概率项的提取,python,set,Python,Set,我有一个列表[[1,2,7]，[1,2,3]，[1,2,3,7]，[1,2,3,5,6,7]]，我需要[1,2,3,7]作为最终结果（这是一种逆向工程）。一个逻辑是检查交叉点- while(i<dlistlen): j=i+1 while(j<dlistlen): il = dlist1[i] jl = dlist1[j] tmp = list(set(il) & set(jl)) print tmp #print i,j j

我有一个列表[[1,2,7]，[1,2,3]，[1,2,3,7]，[1,2,3,5,6,7]]，我需要[1,2,3,7]作为最终结果（这是一种逆向工程）。一个逻辑是检查交叉点-

 while(i<dlistlen):
  j=i+1
  while(j<dlistlen):
   il = dlist1[i]
   jl = dlist1[j]

   tmp = list(set(il) & set(jl))
   print tmp 

  #print i,j
   j=j+1 
  i=i+1

看起来我接近得到[1,2,3,7]作为我的最终答案，但不知道如何得到。请注意，在第一个列表中（[[1,2,7]，[1,2,3]，[1,2,3,7]，[1,2,3,5,6,7]]）除了[1,2,3,4]之外，可能还有更多的项目导致一个以上的最终答案。但是现在，我只需要提取[1,2,3,7]。请注意，这不是什么家庭作业，我正在创建适合我需要的自己的聚类算法。

您可以使用该类跟踪元素出现的频率

>>> from itertools import chain
>>> from collections import Counter
>>> l =  [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]

可以使用该类跟踪元素出现的频率

>>> from itertools import chain
>>> from collections import Counter
>>> l =  [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]

看起来您正试图找到两个列表元素的最大交集。这将做到：

from itertools import combinations

# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]

intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

看起来您正试图找到两个列表元素的最大交集。这将做到：

from itertools import combinations

# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]

intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

看一看。这不是交集而是频率分布。我不明白

[1,2,3,7]

是如何成为“最常见概率的项目”你能解释一下你是如何从

[1,2,7]，[1,2,3]，[1,2,3,7]，[1,2,3,3,7]，[1,2,3,5,6,7]

中得到的吗@NPE意识到。。这个问题很奇怪，至少。。。也许所有的物品都不止一次被包括在内？@ppeterka:我不知道奇怪是什么，但它肯定是不完整的。看看。这不是交集而是频率分布。我不明白

[1,2,3,7]

是如何成为“最常见概率的项目”你能解释一下你是如何从

[1,2,7]，[1,2,3]，[1,2,3,7]，[1,2,3,3,7]，[1,2,3,5,6,7]

中得到的吗@NPE意识到。。这个问题很奇怪，至少。。。也许所有物品都不止一次被包括在内？@ppeterka:我不知道奇怪，但它肯定是不完整的。+1，直的。[：4]非常武断，但他没有解释哪一个是他的标准。@Adrino，4不是我的标准，但现在我们可以想到一些最常发生的事情，等等。谢谢。现在我将尝试一个更大的数据。[：4]非常武断，但他没有解释哪一个是他的标准。@Adrino，4不是我的标准，但现在我们可以想到一些最常发生的事情，等等。谢谢。现在我将尝试更大的数据。@DSM谢谢，现在修复。@DSM谢谢，现在修复。