将矩阵中位置低于0的所有元素转换为0(Python)
这是一个矩阵:将矩阵中位置低于0的所有元素转换为0(Python),python,python-3.x,numpy,linear-algebra,Python,Python 3.x,Numpy,Linear Algebra,这是一个矩阵: matrix = [[1, 1, 1, 0], [0, 5, 0, 1], [2, 1, 3, 10]] 我想将位置低于0的所有元素更改为0(在同一列上) 由此产生的矩阵将是: matrix = [[1, 1, 1, 0], [0, 5, 0, 0], [0, 1, 0, 0]] 我已经试过了。报税表是空的 将numpy导入为np def变换(矩阵): newmatrix=np.asarr
matrix = [[1, 1, 1, 0],
[0, 5, 0, 1],
[2, 1, 3, 10]]
我想将位置低于0的所有元素更改为0(在同一列上) 由此产生的矩阵将是:
matrix = [[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]]
我已经试过了。报税表是空的
将numpy导入为np
def变换(矩阵):
newmatrix=np.asarray(矩阵)
i=0
j=0
对于范围(0,len(矩阵[0])-1)内的j:
而i
这是一个简单的(尽管没有优化)算法:
import numpy as np
from numba import jit
m = np.array([[1, 1, 1, 0],
[0, 5, 0, 1],
[2, 1, 3, 10]])
@jit(nopython=True)
def zeroer(m):
a, b = m.shape
for j in range(b):
for i in range(a):
if m[i, j] == 0:
m[i:, j] = 0
break
return m
zeroer(m)
# [[1 1 1 0]
# [0 5 0 0]
# [0 1 0 0]]
方法1(原件)
然后,通过您的示例数据,我得到以下mat
:
In [195]: matrix = [[1, 1, 1, 0],
...: [0, 5, 0, 1],
...: [2, 1, 3, 10]]
In [196]: transform(matrix)
Out[196]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
方法2(进一步优化)
方法3(更优化)
解释
让我们看一下主语句(在方法1中):
我们可以将其分为几个“基本”操作:
False
(数值0
)的布尔掩码,其中mat
的元素为0
,而True
(数值1
)的元素为非零:
mask1 = np.not_equal(mat, 0)
False
为0的事实,使用函数(可以在这里找到一个很好的解释:)
由于1*1==1
和0*0
或0*1
是0
,此“掩码”的所有元素将是0
或1
。由于产品沿列的“累积性质”(因此axis=0
),它们将仅在mask1
为零且低于(!)的位置0
)mat
中与mask2
中的0
对应的那些元素设置为0
。为此,我们创建了一个布尔掩码,该掩码为True
,其中mask2
为0
而False
位于其他位置。通过将逻辑(或二进制)NOT应用于mask2
,可以轻松实现这一点:
mask3 = np.logical_not(mask2)
这里使用“逻辑”NOT创建布尔数组,因此我们避免显式类型转换0
的mat
元素,并将其设置为0
:
mat[mask3] = 0
可选优化 如果您想一想,我们可以取消步骤3和步骤4,如果我们执行以下操作:
mask2 = mask1.cumprod(axis=0, dtype=np.bool) #convert result to boolean type
mat *= mask2 # combined step 3&4
有关完整的实现,请参见上面的“方法2”部分
演出
另外还有几个答案使用了
numpy.ufunc.acculate()
。从根本上说,所有这些方法都围绕着这样一个理念,即0
是一个“特殊”值,即0*anything==0
,或者在@DSM的答案中,False=0cumprod
方法的一个变体是使用累积最小值(或最大值)。我稍微喜欢这个,因为如果您愿意,您可以使用它来避免任何超出比较范围的算术运算,尽管这很难让人激动:
In [37]: m
Out[37]:
array([[ 1, 1, 1, 0],
[ 0, 5, 0, 1],
[ 2, 1, 3, 10]])
In [38]: m * np.minimum.accumulate(m != 0)
Out[38]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
In [39]: np.where(np.minimum.accumulate(m != 0), m, 0)
Out[39]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
@AGNGazer解决方案的更优化版本,使用
np.logical_和.accumulate
以及整数的隐式布尔转换(不需要大量乘法)
时间:
%timeit transform2(m) # AGN's solution
The slowest run took 44.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.93 µs per loop
%timeit transform(m)
The slowest run took 9.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.99 µs per loop
m = np.random.randint(0,5,(100,100))
%timeit transform(m)
The slowest run took 6.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 43.9 µs per loop
%timeit transform2(m)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 50.4 µs per loop
看起来大约有15%的加速。“我想将0以下的所有元素都更改为0(在同一行上)。”但所有元素都高于0。那么为什么矩阵应该改变呢?应该是:
matrix=[[1,1,1,0],[0,5,0,-1],-2,1,-3,-10],
?事实上我需要将列中0以下的所有数字乘以0@FritzFABO我已经把你的问题编辑成我认为应该是的。你能看一下吗?为什么要返回print()
?它总是无
@kmario23不是真的:binary\u Not
应用于Not_equal().cumprod()
@kmario23我以一种更明确的方式重写了主语句。或者更详细地说,在它周围放一个偏执的句子:mat[~(mat!=0).cumprod(axis=0).astype(np.bool))=0
,以避免混淆。很好的解决方案,顺便说一句:)@FritzFABO基本上~
是二进制的
,但当参数是布尔类型时,可以用它代替逻辑的
(mat!=0)
与不相等()
相同-请参阅。对于cumprod()
-请参见@FritzFABO,我添加了1)详细解释和2)进一步简化/优化的算法。我在我的答案中添加了一个计时比较部分,它显示,如发布的,您的方法比所有其他方法都慢,除了我的方法1,它是基于numpy
的方法中最慢的方法。m*np.minimum.acculate(m,dtype=np.bool)
也可以使用。
mask3 = np.logical_not(mask2)
mat[mask3] = 0
mask2 = mask1.cumprod(axis=0, dtype=np.bool) #convert result to boolean type
mat *= mask2 # combined step 3&4
In [1]: import sys
...: import numpy as np
...:
In [2]: print(sys.version)
...:
3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
In [3]: print(np.__version__)
...:
1.12.1
In [4]: # Method 1 (Original)
...: def transform1(matrix):
...: mat = np.asarray(matrix)
...: mat[np.logical_not(np.not_equal(mat, 0).cumprod(axis=0))] = 0
...: return mat
...:
In [5]: # Method 2:
...: def transform2(matrix):
...: mat = np.asarray(matrix)
...: mat *= (mat != 0).cumprod(axis=0, dtype=np.bool)
...: return mat
...:
In [6]: # @DSM method:
...: def transform_DSM(matrix):
...: mat = np.asarray(matrix)
...: mat *= np.minimum.accumulate(mat != 0)
...: return mat
...:
In [7]: # @DanielF method:
...: def transform_DanielF(matrix):
...: mat = np.asarray(matrix)
...: mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
...: return mat
...:
In [8]: # Optimized @DanielF method:
...: def transform_DanielF_optimized(matrix):
...: mat = np.asarray(matrix)
...: mat *= np.logical_and.accumulate(mat, dtype=np.bool)
...: return mat
...:
In [9]: matrix = np.random.randint(0, 20000, (20000, 20000))
In [10]: %timeit -n1 transform1(matrix)
22.1 s ± 241 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [11]: %timeit -n1 transform2(matrix)
9.29 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [12]: %timeit -n1 transform3(matrix)
9.23 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [13]: %timeit -n1 transform_DSM(matrix)
9.24 s ± 195 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [14]: %timeit -n1 transform_DanielF(matrix)
10.3 s ± 219 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [15]: %timeit -n1 transform_DanielF_optimized(matrix)
9.27 s ± 187 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [37]: m
Out[37]:
array([[ 1, 1, 1, 0],
[ 0, 5, 0, 1],
[ 2, 1, 3, 10]])
In [38]: m * np.minimum.accumulate(m != 0)
Out[38]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
In [39]: np.where(np.minimum.accumulate(m != 0), m, 0)
Out[39]:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
def transform(matrix):
mat = np.asarray(matrix)
mat[~np.logical_and.accumulate(mat, axis = 0)] = 0
return mat
transform(m)
Out:
array([[1, 1, 1, 0],
[0, 5, 0, 0],
[0, 1, 0, 0]])
%timeit transform2(m) # AGN's solution
The slowest run took 44.73 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.93 µs per loop
%timeit transform(m)
The slowest run took 9.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.99 µs per loop
m = np.random.randint(0,5,(100,100))
%timeit transform(m)
The slowest run took 6.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 43.9 µs per loop
%timeit transform2(m)
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 50.4 µs per loop