Python 查找每列中出现次数最多的元素的最简单方法_Python_List_Matrix

Python 查找每列中出现次数最多的元素的最简单方法

python list matrix

Python 查找每列中出现次数最多的元素的最简单方法,python,list,matrix,Python,List,Matrix,如果我有 data = [[a, a, c], [b, c, c], [c, b, b], [b, a, c]] 我想得到一个列表，其中包含在每列中出现最多的元素：result=[b，a，c]，最简单的方法是什么我使用Python 2.6.6使用列表理解加： zip（*data）将列表重新排列为列列表Counter（）objects计算输入序列中任何东西出现的频率，并且。最常见的（1）为我们提供了最流行的元素（加上它的计数）如果您的输入是单个字符串，则会给出： >>>

如果我有

data =
[[a, a, c],
 [b, c, c],
 [c, b, b],
 [b, a, c]]

我想得到一个列表，其中包含在每列中出现最多的元素：

result=[b，a，c]

，最简单的方法是什么

我使用Python 2.6.6

使用列表理解加：

zip（*data）

将列表重新排列为列列表

Counter（）

objects计算输入序列中任何东西出现的频率，并且

。最常见的（1）

为我们提供了最流行的元素（加上它的计数）

如果您的输入是单个字符串，则会给出：

>>> [Counter(col).most_common(1)[0][0] for col in zip(*data)]
['b', 'a', 'c']

数据可以散列吗？如果是这样的话，a将很有帮助：

[Counter(col).most_common(1)[0][0] for col in zip(*data)]

它之所以有效，是因为

zip（*data）

转换输入数据，一次生成一列。然后计数器对元素进行计数，并将计数作为值存储在字典中<代码>计数器还有一个

most_common

方法，该方法返回计数最高的“N”项列表（从最多计数到最少计数排序）。因此，您希望获得most_common返回的列表中第一项中的第一个元素，这就是

[0][0]

的来源

e、 g

这里有一个不使用collections模块的解决方案

def get_most_common(data):

    data = zip(*data)
    count_dict = {}
    common = []
    for col in data:
        for val in col:
            count_dict[val] = count_dict.get(val, 0) + 1
        max_count = max([count_dict[key] for key in count_dict])
        common.append(filter(lambda k: count_dict[k] == max_count, count_dict))

    return common

if __name__ == "__main__":

    data = [['a','a','b'],
            ['b','c','c'],
            ['a','b','b'],
            ['b','a','c']]

    print get_most_common(data)

在统计学中，您需要的称为模式。scipy库（）在

scipy.stats

中有一个

模式

函数

In [32]: import numpy as np

In [33]: from scipy.stats import mode

In [34]: data = np.random.randint(1,6, size=(6,8))

In [35]: data
Out[35]: 
array([[2, 1, 5, 5, 3, 3, 1, 4],
       [5, 3, 2, 2, 5, 2, 5, 3],
       [2, 2, 5, 3, 3, 2, 1, 1],
       [2, 4, 1, 5, 4, 4, 4, 5],
       [4, 4, 5, 5, 2, 4, 4, 4],
       [2, 4, 1, 1, 3, 3, 1, 3]])

In [36]: val, count = mode(data, axis=0)

In [37]: val
Out[37]: array([[ 2.,  4.,  5.,  5.,  3.,  2.,  1.,  3.]])

In [38]: count
Out[38]: array([[ 4.,  3.,  3.,  3.,  3.,  2.,  3.,  2.]])

你的标题要求“每列的最大值”；您的问题要求“出现最多的元素”。我从你的例子中假设你想要后者？您是否关心在ties的情况下会发生什么？@DSM是的，后者，您所说的ties是什么意思？您希望

[a，a，b，b]

返回什么？@DSM不重要，如果出现的次数相同，请说第一个元素。数据是数字（例如整数还是浮点）？或者对元素的数据类型有任何其他限制？另外，如果我们还需要与输出相关联的计数：[b，a，c]，以及计数[2，2，3]，那么可以使用计数器？@shn--参见我的解释。只需删除其中的一个

[0]

就可以得到类似

[（b，2），（a，2），（c，3）]

的东西。哦，我使用的是python 2.6.6，似乎这个计数器不受支持version@shn--是一个旨在为Python 2工作的版本。5@shn--我还没试过，但您也可以直接在python2.6.6上使用python2.7源代码作为计数器。请参阅我在上面发布的链接。

def get_most_common(data):

    data = zip(*data)
    count_dict = {}
    common = []
    for col in data:
        for val in col:
            count_dict[val] = count_dict.get(val, 0) + 1
        max_count = max([count_dict[key] for key in count_dict])
        common.append(filter(lambda k: count_dict[k] == max_count, count_dict))

    return common

if __name__ == "__main__":

    data = [['a','a','b'],
            ['b','c','c'],
            ['a','b','b'],
            ['b','a','c']]

    print get_most_common(data)

In [32]: import numpy as np

In [33]: from scipy.stats import mode

In [34]: data = np.random.randint(1,6, size=(6,8))

In [35]: data
Out[35]: 
array([[2, 1, 5, 5, 3, 3, 1, 4],
       [5, 3, 2, 2, 5, 2, 5, 3],
       [2, 2, 5, 3, 3, 2, 1, 1],
       [2, 4, 1, 5, 4, 4, 4, 5],
       [4, 4, 5, 5, 2, 4, 4, 4],
       [2, 4, 1, 1, 3, 3, 1, 3]])

In [36]: val, count = mode(data, axis=0)

In [37]: val
Out[37]: array([[ 2.,  4.,  5.,  5.,  3.,  2.,  1.,  3.]])

In [38]: count
Out[38]: array([[ 4.,  3.,  3.,  3.,  3.,  2.,  3.,  2.]])