Python 二维数组中沿列的Numpy（或scipy）频率计数_Python_Arrays_Numpy

Python 二维数组中沿列的Numpy（或scipy）频率计数

python arrays numpy

Python 二维数组中沿列的Numpy（或scipy）频率计数,python,arrays,numpy,Python,Arrays,Numpy,我有一个这样的2D数组 array([[ 1, 0, -1], [ 1, 1, 0], [-1, 0, 1], [ 0, 1, 0]]) 我想得到每列的最大频率值。对于上面的矩阵，我想得到[1,0,0]（或[1,1,0]，因为0和1在第二列中出现两次）我已经研究过numpy.unique，但它只需要1D数组。bincount不起作用，因为我的数组中有负数。我还需要一个矢量化的实现（因为矩阵中有数千行）。您可以尝试以下方法： import

我有一个这样的2D数组

array([[ 1,  0, -1],
       [ 1,  1,  0],
       [-1,  0,  1],
       [ 0,  1,  0]])

我想得到每列的最大频率值。对于上面的矩阵，我想得到[1,0,0]（或[1,1,0]，因为0和1在第二列中出现两次）

我已经研究过numpy.unique，但它只需要1D数组。bincount不起作用，因为我的数组中有负数。我还需要一个矢量化的实现（因为矩阵中有数千行）。

您可以尝试以下方法：

import numpy as np
from collections import Counter

# Create your matrix
a = np.array([[ 1,  0, -1],
              [ 1,  1,  0],
              [-1,  0,  1],
              [ 0,  1,  0]])

# Loop on each column to get the most frequent element and its count
for i in range(a.shape[1]):
    count = Counter(a[:, i])
    count.most_common(1)

输出：

[(1, 2)] # In first column : 1 appears most often (twice)
[(0, 2)] # In second column : 0 appears twice
[(0, 2)] # In third column : 0 appears twice also

您可以尝试以下操作：

import numpy as np
from collections import Counter

# Create your matrix
a = np.array([[ 1,  0, -1],
              [ 1,  1,  0],
              [-1,  0,  1],
              [ 0,  1,  0]])

# Loop on each column to get the most frequent element and its count
for i in range(a.shape[1]):
    count = Counter(a[:, i])
    count.most_common(1)

输出：

[(1, 2)] # In first column : 1 appears most often (twice)
[(0, 2)] # In second column : 0 appears twice
[(0, 2)] # In third column : 0 appears twice also

使用

np.bincount

实现负数有一个技巧：

>>> c = np.array([1,  1, -1,  0]) #array with negative number
>>> d = c - c.min() + 1 #make a fake array where minimum is 1, we know the offset to be c.min() - 1
>>> freq = np.bincount(d) # count frequency
>>> freq
array([0, 1, 1, 2]) #the output frequency array of the fake array, NOTE that each frequency is also the frequency of the original array shifted by c.min() - 1 positions
>>> np.argmax(freq) + c.min() - 1 #no add back the offsets since d was just a fake array
1

现在，有了这个技巧，您可以在每一列中循环查找最频繁的元素。然而，不可否认，该解决方案不是矢量化的。作为

@Jesse Butterfield指出，另一篇文章使用了

scipy.stats.mode

来处理这种情况，但它在处理具有许多独特元素的大型矩阵时速度较慢，受到了批评。最好的方法可能是最好留给经验试验。

使用

np.bincount

来促进负数的使用有一个技巧：

>>> c = np.array([1,  1, -1,  0]) #array with negative number
>>> d = c - c.min() + 1 #make a fake array where minimum is 1, we know the offset to be c.min() - 1
>>> freq = np.bincount(d) # count frequency
>>> freq
array([0, 1, 1, 2]) #the output frequency array of the fake array, NOTE that each frequency is also the frequency of the original array shifted by c.min() - 1 positions
>>> np.argmax(freq) + c.min() - 1 #no add back the offsets since d was just a fake array
1

现在，有了这个技巧，您可以在每一列中循环查找最频繁的元素。然而，不可否认，该解决方案不是矢量化的。作为

@Jesse Butterfield指出，另一篇文章使用了

scipy.stats.mode

来处理这种情况，但它在处理具有许多独特元素的大型矩阵时速度较慢，受到了批评。最理想的方法可能是最好留给经验试验。

这是一个重复-谢谢你指出它。这是一个重复-谢谢你指出它。我不认为计数器是矢量化的。根据很多测试，它的速度非常慢。我不认为计数器是矢量化的。根据许多测试，它非常慢。是的-scipy stats.mode解决了这个问题。谢谢你的巧妙技巧。是的-scipy stats.mode解决了这个问题。不过还是要谢谢你的妙计。