Python 比较numpy数组中的多个列_Python_Numpy_Compare

Python 比较numpy数组中的多个列

python numpy

Python 比较numpy数组中的多个列,python,numpy,compare,Python,Numpy,Compare,我有一个2D numpy数组，大约有12列和1000多行，每个单元格包含一个从1到5的数字。我正在根据我的点系统搜索最佳的六元组列，其中1和2生成-1点，4和5给出+1 例如，如果某个六元组中的一行包含[1,4,5,3,4,3]，则该行的点应为+2，因为3*1+1*（-1）=2。下一行可能是[1,2,2,3,3]，应该是-3点起初，我尝试了一个strait forward循环解决方案，但我意识到有665 280个可能的列组合需要比较，当我还需要搜索最佳的五元组、四元组等时，循环会花费很长时间

我有一个2D numpy数组，大约有12列和1000多行，每个单元格包含一个从1到5的数字。我正在根据我的点系统搜索最佳的六元组列，其中1和2生成-1点，4和5给出+1

例如，如果某个六元组中的一行包含[1,4,5,3,4,3]，则该行的点应为+2，因为3*1+1*（-1）=2。下一行可能是[1,2,2,3,3]，应该是-3点

起初，我尝试了一个strait forward循环解决方案，但我意识到有665 280个可能的列组合需要比较，当我还需要搜索最佳的五元组、四元组等时，循环会花费很长时间

有没有更聪明的方法来解决我的问题

import numpy as np
import itertools

N_rows = 10
arr = np.random.random_integers(5, size=(N_rows,12))
x = np.array([0,-1,-1,0,1,1])
y = x[arr]

print(y)

score, best_sextuple = max((y[:,cols].sum(), cols)
                           for cols in itertools.combinations(range(12),6))
print('''\
score: {s}
sextuple: {c}
'''.format(s = score, c = best_sextuple))

例如,

score: 6
sextuple: (0, 1, 5, 8, 10, 11)

说明：

首先，让我们生成一个随机示例，包含12列10行：

N_rows = 10
arr = np.random.random_integers(5, size=(N_rows,12))

现在，我们可以使用numpy索引将

arr

1,2，…，5中的数字转换为-1,0,1（根据您的评分系统）：

接下来，让我们使用

itertools.combines

生成6列的所有可能组合：

for cols in itertools.combinations(range(12),6)

及

然后给出

cols

的分数，可以选择列（六元组）

最后，使用

max

选择得分最高的六元组：

score, best_sextuple = max((y[:,cols].sum(), cols)
                           for cols in itertools.combinations(range(12),6))

根据unutbu上面较长的答案，可以自动生成隐藏的分数数组。由于每次通过循环时，您的值分数都是一致的，因此每个值的分数只需计算一次。在应用分数之前和之后，在示例6x10阵列上执行此操作有点不雅观

>>> import numpy
>>> values = numpy.random.randint(6, size=(6,10))
>>> values
array([[4, 5, 1, 2, 1, 4, 0, 1, 0, 4],
       [2, 5, 2, 2, 3, 1, 3, 5, 3, 1],
       [3, 3, 5, 4, 2, 1, 4, 0, 0, 1],
       [2, 4, 0, 0, 4, 1, 4, 0, 1, 0],
       [0, 4, 1, 2, 0, 3, 3, 5, 0, 1],
       [2, 3, 3, 4, 0, 1, 1, 1, 3, 2]])
>>> b = values.copy()
>>> b[ b<3 ] = -1

>>> b[ b==3 ] = 0
>>> b[ b>3 ] = 1
>>> b
array([[ 1,  1, -1, -1, -1,  1, -1, -1, -1,  1],
       [-1,  1, -1, -1,  0, -1,  0,  1,  0, -1],
       [ 0,  0,  1,  1, -1, -1,  1, -1, -1, -1],
       [-1,  1, -1, -1,  1, -1,  1, -1, -1, -1],
       [-1,  1, -1, -1, -1,  0,  0,  1, -1, -1],
       [-1,  0,  0,  1, -1, -1, -1, -1,  0, -1]])

导入numpy >>>values=numpy.random.randint（6，size=（6,10）） >>>价值观数组（[[4,5,1,2,1,4,0,1,0,4]， [2, 5, 2, 2, 3, 1, 3, 5, 3, 1], [3, 3, 5, 4, 2, 1, 4, 0, 0, 1], [2, 4, 0, 0, 4, 1, 4, 0, 1, 0], [0, 4, 1, 2, 0, 3, 3, 5, 0, 1], [2, 3, 3, 4, 0, 1, 1, 1, 3, 2]]) >>>b=值。复制（） >>>b[b>>b[b==3]=0 >>>b[b>3]=1 >>>b 数组（[[1，1，-1，-1，-1，1，-1，-1,1]， [-1, 1, -1, -1, 0, -1, 0, 1, 0, -1], [ 0, 0, 1, 1, -1, -1, 1, -1, -1, -1], [-1, 1, -1, -1, 1, -1, 1, -1, -1, -1], [-1, 1, -1, -1, -1, 0, 0, 1, -1, -1], [-1, 0, 0, 1, -1, -1, -1, -1, 0, -1]]) 顺便说一句，thread声称直接在numpy中创建组合将产生比itertools快5倍左右的性能，尽管可能会牺牲一些可读性

import numpy

A = numpy.random.randint(1, 6, size=(1000, 12))
points = -1*(A == 1) + -1*(A == 2) + 1*(A == 4) + 1*(A == 5)
columnsums = numpy.sum(points, 0)

def best6(row):
    return numpy.argsort(row)[-6:]

bestcolumns = best6(columnsums)
allbestcolumns = map(best6, points)

bestcolumns

现在将按升序包含最好的6列。按照类似的逻辑，

allbestcolumns

将在每行中包含最好的6列。

您可以发布循环解决方案吗？有时候，优化已经运行的代码比尝试重新发明轮子更容易……发布循环解决方案的另一个优点是解决方案是它解决了歧义。例如，我不确定如果你对每一行的六列求和，你是否想找到六列，这六列给出了最大的总数（这很容易）或者其他一些。这也可能有助于了解更多关于数据集的信息。例如，听起来你愿意接受一行中的任何六个答案-如果每行都是一个观察值，那么为什么可以拒绝其余的观察值？你的数据数组能否以某种方式进行重组以简化搜索空间？我想不出对这个问题的解释e在组合中搜索的结果将永远不同于排序和取最大值N。您是在寻找仅使一行或整个矩阵最大化的列组合吗？这是我最初解释问题的方式，但其他人给出了同样合理的解释。我将使用

.argsort（）[-6:]

。我将它改为

argsort

，但我在这里有点新，所以我不确定在我的回答中加入这样的建议是否合乎礼仪。这条评论可以作为一种披露。那么欢迎这样做！如果从评论中可以明显看出这一点，那么就没有必要相信（除非它真的非常聪明，但这只是一个小小的变化）。我注意到，通常较高的业力类型会在精神上对答案做出最接近于他们自己写的答案的评论，而不是使用

best6

，不过，我可能会使用

best

，并将6作为一个参数。

>>> import numpy
>>> values = numpy.random.randint(6, size=(6,10))
>>> values
array([[4, 5, 1, 2, 1, 4, 0, 1, 0, 4],
       [2, 5, 2, 2, 3, 1, 3, 5, 3, 1],
       [3, 3, 5, 4, 2, 1, 4, 0, 0, 1],
       [2, 4, 0, 0, 4, 1, 4, 0, 1, 0],
       [0, 4, 1, 2, 0, 3, 3, 5, 0, 1],
       [2, 3, 3, 4, 0, 1, 1, 1, 3, 2]])
>>> b = values.copy()
>>> b[ b<3 ] = -1

>>> b[ b==3 ] = 0
>>> b[ b>3 ] = 1
>>> b
array([[ 1,  1, -1, -1, -1,  1, -1, -1, -1,  1],
       [-1,  1, -1, -1,  0, -1,  0,  1,  0, -1],
       [ 0,  0,  1,  1, -1, -1,  1, -1, -1, -1],
       [-1,  1, -1, -1,  1, -1,  1, -1, -1, -1],
       [-1,  1, -1, -1, -1,  0,  0,  1, -1, -1],
       [-1,  0,  0,  1, -1, -1, -1, -1,  0, -1]])

import numpy

A = numpy.random.randint(1, 6, size=(1000, 12))
points = -1*(A == 1) + -1*(A == 2) + 1*(A == 4) + 1*(A == 5)
columnsums = numpy.sum(points, 0)

def best6(row):
    return numpy.argsort(row)[-6:]

bestcolumns = best6(columnsums)
allbestcolumns = map(best6, points)