Python 在列表中查找最常见的元素_Python_List

Python 在列表中查找最常见的元素

python list

Python 在列表中查找最常见的元素,python,list,Python,List,在Python列表中查找最常见元素的有效方法是什么我的列表项可能无法哈希，因此无法使用字典。此外，如果是抽签，则应返回索引最低的项目。例如： >>> most_common(['duck', 'duck', 'goose']) 'duck' >>> most_common(['goose', 'duck', 'duck', 'goose']) 'goose' 如果它们是不可散列的，您可以对它们进行排序，并在计算项目的结果上执行单个循环（相同的项目将彼此相

在Python列表中查找最常见元素的有效方法是什么

我的列表项可能无法哈希，因此无法使用字典。此外，如果是抽签，则应返回索引最低的项目。例如：

>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

如果它们是不可散列的，您可以对它们进行排序，并在计算项目的结果上执行单个循环（相同的项目将彼此相邻）。但让它们可以散列并使用dict可能会更快

def most_common(lst):
    cur_length = 0
    max_length = 0
    cur_i = 0
    max_i = 0
    cur_item = None
    max_item = None
    for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
        if cur_item is None or cur_item != item:
            if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
                max_length = cur_length
                max_i = cur_i
                max_item = cur_item
            cur_length = 1
            cur_i = i
            cur_item = item
        else:
            cur_length += 1
    if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
        return cur_item
    return max_item

def最常见（lst）：
cur_长度=0
最大长度=0
cur_i=0
max_i=0
当前项目=无
最大项=无
对于i，已排序的项（枚举（lst），key=lambda x:x[1]）：
如果cur_item为None或cur_item！=项目：
如果cur_length>max_length或（cur_length==max_length，cur_imax_length或（cur_length==max_length，cur_i

对列表的副本进行排序，并查找最长运行时间。您可以在使用每个元素的索引对列表进行排序之前对其进行修饰，然后在平局的情况下选择以最低索引开始的运行。

如果排序和哈希都不可行，但可以使用相等比较（

），这显然是一个缓慢的解决方案（O（n^2））：

def most_common(items):
  if not items:
    raise ValueError
  fitems = [] 
  best_idx = 0
  for item in items:   
    item_missing = True
    i = 0
    for fitem in fitems:  
      if fitem[0] == item:
        fitem[1] += 1
        d = fitem[1] - fitems[best_idx][1]
        if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
          best_idx = i
        item_missing = False
        break
      i += 1
    if item_missing:
      fitems.append([item, 1, i])
  return items[best_idx]

但是，如果列表长度（n）很大，那么将项目设置为可散列或可排序（如其他答案所建议的那样）几乎总是能够更快地找到最常见的元素。散列平均为O（n），排序最差为O（n*log（n））。

此处：

def most_common(l):
    max = 0
    maxitem = None
    for x in set(l):
        count =  l.count(x)
        if count > max:
            max = count
            maxitem = x
    return maxitem

我有一种模糊的感觉，在标准库中的某个地方有一种方法可以为您提供每个元素的计数，但我找不到它。

>>li=['goose'，'duck'，'duck']
>>> li  = ['goose', 'duck', 'duck']

>>> def foo(li):
         st = set(li)
         mx = -1
         for each in st:
             temp = li.count(each):
             if mx < temp:
                 mx = temp 
                 h = each 
         return h

>>> foo(li)
'duck'

>>>李德福：
st=设定值（li）
mx=-1
对于st中的每个：
温度=锂计数（每个）：
如果mx<温度：
mx=温度
h=每个
返回h
>>>傅（李）
“鸭子”

一行：

def most_common (lst):
    return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]

def most_common(lst):
    return max(set(lst), key=lst.count)

更简单的一行：

def most_common (lst):
    return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]

def most_common(lst):
    return max(set(lst), key=lst.count)

这是一个O（n）解

mydict   = {}
cnt, itm = 0, ''
for item in reversed(lst):
     mydict[item] = mydict.get(item, 0) + 1
     if mydict[item] >= cnt :
         cnt, itm = mydict[item], item

print itm

（反转用于确保它返回最低索引项）

提出了许多解决方案，我惊讶的是没有人提出我认为是显而易见的（对于非哈希但可比较的元素）——[<代码>迭代器.GROPBB< <代码> > [ 1 ]。code>itertools提供了快速、可重用的功能，并允许您将一些棘手的逻辑委托给经过良好测试的标准库组件。例如，考虑：

import itertools
import operator

def most_common(L):
  # get an iterable of (item, iterable) pairs
  SL = sorted((x, i) for i, x in enumerate(L))
  # print 'SL:', SL
  groups = itertools.groupby(SL, key=operator.itemgetter(0))
  # auxiliary function to get "quality" for an item
  def _auxfun(g):
    item, iterable = g
    count = 0
    min_index = len(L)
    for _, where in iterable:
      count += 1
      min_index = min(min_index, where)
    # print 'item %r, count %r, minind %r' % (item, count, min_index)
    return count, -min_index
  # pick the highest-count/earliest item
  return max(groups, key=_auxfun)[0]

当然，这可以写得更简洁，但我的目标是最大程度地清晰。两个

print

语句可以取消注释，以便更好地查看正在运行的机器；例如，对于未注释的打印：

print most_common(['goose', 'duck', 'duck', 'goose'])

发射：

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose

如您所见，

SL

是一个对的列表，每对都是一个项目，后跟原始列表中的项目索引（要实现关键条件，即如果具有相同最高计数的“最常见”项目大于1，则结果必须是最早出现的项目）

groupby

仅按项目分组（通过

operator.itemgetter

）。辅助函数在

max

计算过程中每分组调用一次，接收并在内部解压一个组-一个包含两个项的元组

（item，iterable）

，其中iterable的项也是两个项元组，

（item，original index）

[[code>SL]的项]

然后，辅助函数使用一个循环来确定组的iterable中的条目数和最小原始索引；它将这些组合为“质量键”，MIN索引符号发生更改，因此Max 操作将考虑“更好地”在原始列表中较早出现的那些项目。

如果代码在时间和空间上少担心一些大O问题，那么代码可能会简单得多，例如……：

def most_common(L):
  groups = itertools.groupby(sorted(L))
  def _auxfun((item, iterable)):
    return len(list(iterable)), -L.index(item)
  return max(groups, key=_auxfun)[0]

同样的基本理念，只是表达得更简单、更简洁。。。但是，唉，一个额外的O（N）辅助空间（将组的易位体现到列表中）和O（N平方）时间（获取每个项目的

L.index

）。虽然过早的优化是编程中所有问题的根源，但当O（N logn）方法可用时，故意选择O（N平方）方法与可伸缩性的粒度太不相称了！）

最后，对于那些喜欢“一行程序”而不喜欢清晰性和性能的人来说，一个额外的单行程序版本，带有适当的名称：-）

你可能不再需要这个了，但这就是我为类似问题所做的。（由于评论的原因，它看起来比实际更长。）

借用，这可以与Python 2.7一起使用：

from collections import Counter

def Most_Common(lst):
    data = Counter(lst)
    return data.most_common(1)[0][0]

工作速度比Alex的解决方案快4-6倍，比newacct提出的一行程序快50倍

若要检索列表中第一个出现的元素，请执行以下操作：

def most_common(lst):
    data = Counter(lst)
    return max(lst, key=data.get)

您所需要的在统计中称为模式，Python当然有一个内置函数来为您实现这一点：

>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3

请注意，如果没有“最常见的元素”，例如前两名并列的情况，这将引发

统计错误

，因为从统计角度讲，这种情况下没有模式。

我需要在最近的一个程序中这样做。我承认，我不明白亚历克斯的答案，所以这就是我的结局

def popular(L):
C={}
for a in L:
    C[a]=L.count(a)
for b in C.keys():
    if C[b]==max(C.values()):
        return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)

def mostPopular(l):
    mpEl=None
    mpIndex=0
    mpCount=0
    curEl=None
    curCount=0
    for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
        curCount=curCount+1 if el==curEl else 1
        curEl=el
        if curCount>mpCount \
        or (curCount==mpCount and i<mpIndex):
            mpEl=curEl
            mpIndex=i
            mpCount=curCount
    return mpEl, mpCount, mpIndex

def最流行（l）：
mpEl=无
mpIndex=0
mpCount=0
curEl=无
curCount=0
对于i，已排序的el（枚举（l），key=lambda x:（x[1]，x[0]），reverse=True）：
如果el==curEl else 1，则curCount=curCount+1
curEl=el
如果curCount>mpCount\
或者（curCount==mpCount，iHi这是一个非常简单的解决方案，具有大O（n）
其中，对列表中大部分时间重复的元素进行编号
 def most_common(lst):
    if max([lst.count(i)for i in lst]) == 1:
        return False
    else:
        return max(set(lst), key=lst.count)

def mostPopular(l):
    mpEl=None
    mpIndex=0
    mpCount=0
    curEl=None
    curCount=0
    for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
        curCount=curCount+1 if el==curEl else 1
        curEl=el
        if curCount>mpCount \
        or (curCount==mpCount and i<mpIndex):
            mpEl=curEl
            mpIndex=i
            mpCount=curCount
    return mpEl, mpCount, mpIndex

L = [1, 4, 7, 5, 5, 4, 5]

def mode_f(L):
# your code here
    counter = 0
    number = L[0]
    for i in L:
        amount_times = L.count(i)
        if amount_times > counter:
            counter = amount_times
            number = i

    return number

from statistics import mode, StatisticsError

def most_common(l):
    try:
        return mode(l)
    except StatisticsError as e:
        # will only return the first element if no unique mode found
        if 'no unique mode' in e.args[0]:
            return l[0]
        # this is for "StatisticsError: no mode for empty data"
        # after calling mode([])
        raise

>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data

moc= max([(lst.count(chr),chr) for chr in set(lst)])

def mostCommonElement(list):
  count = {} // dict holder
  max = 0 // keep track of the count by key
  result = None // holder when count is greater than max
  for i in list:
    if i not in count:
      count[i] = 1
    else:
      count[i] += 1
    if count[i] > max:
      max = count[i]
      result = i
  return result

from collections import Counter

a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801] 

c = Counter(a)

print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))

 most_freq_val = 5

ans  = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)

all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())

print(all_ans[max_key])

from collections import Counter

def majorityElement(arr):        
    majority_elem = Counter(arr)
    size = len(arr)
    for key, val in majority_elem.items():
        if val > size/2:
            return key
    return -1