Python 数组中的条件选择_Python_Arrays_Numpy

Python 数组中的条件选择

python arrays numpy

Python 数组中的条件选择,python,arrays,numpy,Python,Arrays,Numpy,我有以下数组列表： [array([10, 1, 7, 3]), array([ 0, 14, 12, 13]), array([ 3, 10, 7, 8]), array([7, 5]), array([ 5, 12, 3]), array([14, 8, 10])] 我想要的是将行标记为“1”或“0”，条件是该行是否匹配“10”和“7”或“10”和“3” 返回 array(0) 进入数组的数组的正确语法是什么预期输出： [ 1, 0, 1, 0, 0, 1 ]

我有以下数组列表：

[array([10,  1,  7,  3]),
 array([ 0, 14, 12, 13]),
 array([ 3, 10,  7,  8]),
 array([7, 5]),
 array([ 5, 12,  3]),
 array([14,  8, 10])]

我想要的是将行标记为“1”或“0”，条件是该行是否匹配“10”和“7”或“10”和“3”

array(0)

进入数组的数组的正确语法是什么

预期输出：

[ 1, 0, 1, 0, 0, 1 ]

注: 什么是

输出

？在Scikit中训练CountVectorizer/LDA主题分类器后，下面的脚本将主题概率分配给新文档。然后，阈值0.2以上的主题存储在一个数组中

def sortthreshold(x, thresh):
    idx = np.arange(x.size)[x > thresh]
    return idx[np.argsort(x[idx])]

output = []
for x in newdoc:
    y = lda.transform(bowvectorizer.transform([x]))
    output.append(sortthreshold(y[0], 0.2))

谢谢

您需要使用

np.any

与

np.where

组合，并避免使用python中的二进制运算符

和

import numpy as np

a = [np.array([10,  1,  7,  3]),
     np.array([ 0, 14, 12, 13]),
     np.array([ 3, 10,  7,  8]),
     np.array([7, 5]),
     np.array([ 5, 12,  3]),
     np.array([14,  8, 10])]

for output in a:
    print(np.where(((any(output == 10) and any(output == 7))) or 
                   (any(output == 10) and any(output == 3)) or
                   (any(output == 10) and any(output == 8 )), 1, 0))

输出：如果您希望将其作为列表，如编辑的问题所示：

result = []
for output in a:
    result.append(1 if np.where(((any(output == 10) and any(output == 7))) or 
                   (any(output == 10) and any(output == 3)) or
                   (any(output == 10) and any(output == 8 )), 1, 0) == True else 0)

result

结果:

您的输入数据是长度不等的Numpy数组的普通Python列表，因此它不能简单地转换为2D Numpy数组，因此它不能直接由Numpy处理。但它可以使用常用的Python列表处理工具进行处理

下面是一个列表理解，用于测试一行是否包含（3、7、8）中的任何一个。我们首先使用简单的

测试来查看该行是否包含10，如果包含10，则只调用

isin

；如果第一个操作数为false，Python

和

运算符将不会计算其第二个操作数

我们用来查看是否有任何行项目通过了每个测试

np.any

返回布尔值

False

或

True

，但我们可以将这些值传递给

int

，将它们转换为0或1

import numpy as np

data = [
    np.array([10, 1, 7, 3]), np.array([0, 14, 12, 13]),
    np.array([3, 10, 7, 8]), np.array([7, 5]),
    np.array([5, 12, 3]), np.array([14, 8, 10]),
]

mask = np.array([3, 7, 8])
result = [int(np.any(row==10) and np.any(np.isin(row, mask)))
    for row in data]

print(result)

输出

[1, 0, 1, 0, 0, 1]

rebloch0: [1, 0, 1, 0, 0, 1]
rebloch1: [1, 0, 1, 0, 0, 1]
pm2r0   : [1, 0, 1, 0, 0, 1]
pm2r1   : [1, 0, 1, 0, 0, 1]
hpaulj0 : [1, 0, 1, 0, 0, 1]
hpaulj1 : [1, 0, 1, 0, 0, 1]

hpaulj1 : 0.140421, 0.154910, 0.156105
hpaulj0 : 0.154224, 0.154822, 0.167101
rebloch1: 0.281700, 0.282764, 0.284599
rebloch0: 0.339693, 0.359127, 0.375715
pm2r1   : 0.367677, 0.368826, 0.371599
pm2r0   : 0.626043, 0.628232, 0.670199

我刚刚做了一些测试。奇怪的是，Reblochon Masque的代码比问题中给出的数据更快，这可能是因为普通Python的短路行为。而且，它似乎比

numpy.isin

更快，尽管文档建议在新代码中使用后者

这是一个新版本，比雷布劳肯的慢10%左右

mask = np.array([3, 7, 8])
result = [int(any(row==10) and any(np.in1d(row, mask)))
    for row in data]

当然，大量真实数据的真实速度可能与我的测试结果有所不同。时间可能不是问题：即使在我那台速度很慢的32位单核2GHz机器上，我也能在一秒钟内处理问题中的数据近3000次

hpaulj提出了一种更快的方法。这里有一些信息，比较不同的版本。这些测试是在我的旧机器YMMV上进行的

import numpy as np
from timeit import Timer

the_data = [
    np.array([10, 1, 7, 3]), np.array([0, 14, 12, 13]),
    np.array([3, 10, 7, 8]), np.array([7, 5]),
    np.array([5, 12, 3]), np.array([14, 8, 10]),
]

def rebloch0(data):
    result = []
    for output in data:
        result.append(1 if np.where((any(output == 10) and any(output == 7)) or
            (any(output == 10) and any(output == 3)) or
            (any(output == 10) and any(output == 8)), 1, 0) == True else 0)
    return result

def rebloch1(data):
    result = []
    for output in data:
        result.append(1 if np.where((any(output == 10) and any(output == 7)) or
            (any(output == 10) and any(output == 3)) or
            (any(output == 10) and any(output == 8)), 1, 0) else 0)
    return result

def pm2r0(data):
    mask = np.array([3, 7, 8])
    return [int(np.any(row==10) and np.any(np.isin(row, mask)))
        for row in data]

def pm2r1(data):
    mask = np.array([3, 7, 8])
    return [int(any(row==10) and any(np.in1d(row, mask)))
        for row in data]

def hpaulj0(data):
    mask=np.array([3, 7, 8])
    return [int(any(row==10) and any((row[:, None]==mask).flat))
        for row in data]

def hpaulj1(data, mask=np.array([3, 7, 8])):
    return [int(any(row==10) and any((row[:, None]==mask).flat))
        for row in data]

functions = (
    rebloch0,
    rebloch1,
    pm2r0,
    pm2r1,
    hpaulj0,
    hpaulj1,
)

# Verify that all functions give the same result
for func in functions:
    print('{:8}: {}'.format(func.__name__, func(the_data)))
print()

def time_test(loops, data):
    timings = []
    for func in functions:
        t = Timer(lambda: func(data))
        result = sorted(t.repeat(3, loops))
        timings.append((result, func.__name__))
    timings.sort()
    for result, name in timings:
        print('{:8}: {:.6f}, {:.6f}, {:.6f}'.format(name, *result))
    print()

time_test(1000, the_data)

典型输出

[1, 0, 1, 0, 0, 1]

rebloch0: [1, 0, 1, 0, 0, 1]
rebloch1: [1, 0, 1, 0, 0, 1]
pm2r0   : [1, 0, 1, 0, 0, 1]
pm2r1   : [1, 0, 1, 0, 0, 1]
hpaulj0 : [1, 0, 1, 0, 0, 1]
hpaulj1 : [1, 0, 1, 0, 0, 1]

hpaulj1 : 0.140421, 0.154910, 0.156105
hpaulj0 : 0.154224, 0.154822, 0.167101
rebloch1: 0.281700, 0.282764, 0.284599
rebloch0: 0.339693, 0.359127, 0.375715
pm2r1   : 0.367677, 0.368826, 0.371599
pm2r0   : 0.626043, 0.628232, 0.670199

干得好，hpaulj

您的意思是将值替换为0还是1？这看起来像是一个简单的Python Numpy数组列表。什么是

输出

？输出是一个由LDA主题模型创建的数组。数组中的数字对应于主题加载高于给定阈值的主题。为什么

[14,8,10]

匹配？它有一个10，但没有7或3。我们（可能）不需要看到创建

输出的原始函数。但我们确实需要您使您向我们展示的代码明确且自我一致。您一直在调用output
数组，但它看起来像一个列表。根据您刚才添加的代码，output
是一个列表，而不是数组。FWIW在Numpy中，&
和|
可以用于逻辑操作。但是，与和和或不同，它们不会短路。我不知道关于&
和|
与numpy的关系，谢谢@pm2ring无需担心。这有点令人惊讶。当然，Numpy不能使用传统的C操作符&&
和|
，因为Python解析器会拒绝它们。我只是做了一些timeit
测试。您的代码的速度几乎是我在OP data上的原始版本的两倍。hpaulj提出了一个建议，它确实加快了速度。我在我的答案中添加了一个timeit
测试。这些想法的混合速度更快：int（any（arr==10）和any（（arr[：，None]==3,7,8]）.flat））@hpaulj哇，太令人印象深刻了！
rebloch0: [1, 0, 1, 0, 0, 1]
rebloch1: [1, 0, 1, 0, 0, 1]
pm2r0   : [1, 0, 1, 0, 0, 1]
pm2r1   : [1, 0, 1, 0, 0, 1]
hpaulj0 : [1, 0, 1, 0, 0, 1]
hpaulj1 : [1, 0, 1, 0, 0, 1]

hpaulj1 : 0.140421, 0.154910, 0.156105
hpaulj0 : 0.154224, 0.154822, 0.167101
rebloch1: 0.281700, 0.282764, 0.284599
rebloch0: 0.339693, 0.359127, 0.375715
pm2r1   : 0.367677, 0.368826, 0.371599
pm2r0   : 0.626043, 0.628232, 0.670199