在numpy数组中查找连续的_Numpy

在numpy数组中查找连续的

numpy

在numpy数组中查找连续的,numpy,Numpy,如何在以下numpy数组的每行中找到连续1（或任何其他值）的数量。我需要一个纯numpy的解决方案 counts Out[304]: array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0], [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1], [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]]) 期望的解决方案第一个问题（一行最多1个）：金额：数组（[2,3,2]）第二个问题（一

如何在以下numpy数组的每行中找到连续1（或任何其他值）的数量。我需要一个纯numpy的解决方案

counts
Out[304]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
       [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

期望的解决方案第一个问题（一行最多1个）：金额：数组（[2,3,2]）

第二个问题（一行中有2x a 1的索引：索引：数组（[3,9,9]）

在本例中，我将2x放在一行中，但应该可以将其更改为5x，这很重要

问题的第二部分是，一旦找到哪些值具有5个或更多连续的1（或任何其他值），我将需要if的起始索引。同样，这应该按行进行

np.unique回答了一个类似的问题，但它只适用于一行，而不适用于具有多行的数组，因为结果将具有不同的长度：

下面是一种基于-

样本输入、输出-

原始样本案例：

In [574]: counts
Out[574]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
       [0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)

修改案例：

In [577]: counts
Out[577]: 
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
   [0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
   [0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])

In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)

这是一个纯NumPy版本，它与之前的版本完全相同，直到我们开始、停止-

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]

# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs

# Get max along each row as final output
out = intvs2D.max(1)

#在计数的任一侧追加零列
append1=np.zero（（counts.shape[0]，1），dtype=int）
counts\u ext=np.column\u堆栈（（附录1，counts，附录1））
#以1作为触发器获取开始和停止索引
diff=np.diff（（counts_ext==1.astype（int），axis=1）
starts=np.argwhere（差异==1）
停止=np.argwhere（差异==-1）
#使用开始和停止索引之间的差异获取间隔
intvs=停止[：，1]-启动[：，1]
#将间隔存储为2D数组，以便进行进一步的矢量化操作。
c=np.bincount（开始[：，0]）
mask=np.arange（c.max（））

我认为一个非常类似的问题是检查排序行之间的元素差异是否一定。如果连续5行之间的差异为1，则如下所示。两张卡的差异为0时也可以这样做：

cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)

第一行的

效果如何？更正了所需的输出您是否同意使用pandas模块？不幸的是，它需要纯numpy，因为速度非常重要。pandas会慢得多。如果pandas不慢，您会使用它吗？它可能会工作，但我刚刚计时，速度太慢了75倍左右。看起来Pan和pandas之间的速度差很大das和Numpy是巨大的。困难在于在纯Numpy解决方案中对其进行矢量化，在我的情况下，这是必要的。我确信这是可能的！我使用我在Numpy阵列上对1m行执行的类似操作测量速度差，而对1m行执行这些操作所需的时间不到100ms。@nickpick查看编辑的代码是否有更好的效果？是的！现在这是numpy的速度。你改变了什么？@nickpick以及我的评论。我已经将间隔放入了一个常规的2D数组中，用于矢量化操作。我在MATLAB中经常使用这种技术，我称之为

bsxfun的掩蔽功能

，其中

bsxfun

是MATLAB的广播功能。请注意，这会带来更多的内存使用这取决于数据的变化。如果你熟悉MATLAB，这里有几个例子：另一个问题，非常类似（我认为），如果你还有一些精力，请看一下：

cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)