Python 调整二维numpy数组的大小（不包括NaN）_Python_Arrays_Image_Numpy

Python 调整二维numpy数组的大小（不包括NaN）

python arrays image numpy

Python 调整二维numpy数组的大小（不包括NaN）,python,arrays,image,numpy,Python,Arrays,Image,Numpy,我试图调整给定因子的2D numpy数组的大小，从而在输出中获得更小的数组阵列是从图像文件读取的，其中一些值应该是NaN（不是数字，numpy中的np.NaN）：它是卫星遥感测量的结果，只是一些像素没有测量我为此找到的合适包是scypy.misc.imresize，但是输出数组中包含NaN的每个像素都被设置为NaN，即使原始像素中有一些有效数据被插值在一起我的解决方案附在这里，我所做的基本上是：基于原始阵列形状和所需的缩减因子创建新阵列创建一个索引数组，以寻址原始数组中的所有像素，并

我试图调整给定因子的2D numpy数组的大小，从而在输出中获得更小的数组

阵列是从图像文件读取的，其中一些值应该是NaN（不是数字，numpy中的np.NaN）：它是卫星遥感测量的结果，只是一些像素没有测量

我为此找到的合适包是scypy.misc.imresize，但是输出数组中包含NaN的每个像素都被设置为NaN，即使原始像素中有一些有效数据被插值在一起

我的解决方案附在这里，我所做的基本上是：

基于原始阵列形状和所需的缩减因子创建新阵列
创建一个索引数组，以寻址原始数组中的所有像素，并为新数组中的每个像素求平均值
循环通过新阵列像素并平均所有非NaN像素以获得新阵列像素值；如果只有NaN，则输出为NaN

我计划在不同的输出（输入像素的平均值、中值、标准偏差等）之间添加关键字

它按预期工作，但在~1Mpx的图像上大约需要3秒钟。由于我缺乏python方面的经验，我正在寻求改进

有没有人建议如何做得更好、更有效

有人知道一个已经实现了所有这些东西的库吗

谢谢

这里有一个使用以下代码生成的随机像素输入输出示例：

import numpy as np
import pylab as plt
from scipy import misc

def resize_2d_nonan(array,factor):
    """
    Resize a 2D array by different factor on two axis sipping NaN values.
    If a new pixel contains only NaN, it will be set to NaN


    Parameters
    ----------

    array : 2D np array

    factor : int or tuple. If int x and y factor wil be the same

    Returns
    -------
    array : 2D np array scaled by factor

    Created on Mon Jan 27 15:21:25 2014

    @author: damo_ma
    """
    xsize, ysize = array.shape

    if isinstance(factor,int):
        factor_x = factor
        factor_y = factor
    elif isinstance(factor,tuple):
        factor_x , factor_y = factor[0], factor[1]
    else:
        raise NameError('Factor must be a tuple (x,y) or an integer')

    if not (xsize %factor_x == 0 or ysize % factor_y == 0) :
        raise NameError('Factors must be intger multiple of array shape')

    new_xsize, new_ysize = xsize/factor_x, ysize/factor_y

    new_array = np.empty([new_xsize, new_ysize])
    new_array[:] = np.nan # this saves us an assignment in the loop below

    # submatrix indexes : is the average box on the original matrix
    subrow, subcol  = np.indices((factor_x, factor_y))

     # new matrix indexs
    row, col  = np.indices((new_xsize, new_ysize))

    # some output for testing
    #for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
    #    print '----------------------------------------------'
    #    print 'i: %i, j: %i, ind: %i ' % (i, j, ind)    
    #    print 'subrow+i*new_ysize, subcol+j*new_xsize :'    
    #    print i,'*',new_xsize,'=',i*factor_x
    #    print j,'*',new_ysize,'=',j*factor_y
    #    print subrow+i*factor_x,subcol+j*factor_y
    #    print '---'
    #    print 'array[subrow+i*factor_x,subcol+j*factor_y] : '    
    #    print array[subrow+i*factor_x,subcol+j*factor_y]

    for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
        # define the small sub_matrix as view of input matrix subset
        sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y]
        # modified from any(a) and all(a) to a.any() and a.all()
        # see https://stackoverflow.com/a/10063039/1435167
        if not (np.isnan(sub_matrix)).all(): # if we haven't all NaN
            if (np.isnan(sub_matrix)).any(): # if we haven no NaN at all
                msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix))
                (new_array.reshape(-1))[ind] = np.mean(msub_matrix)
            else: # if we haven some NaN
                (new_array.reshape(-1))[ind] = np.mean(sub_matrix)
        # the case assign NaN if we have all NaN is missing due 
        # to the standard values of new_array

    return new_array


row , cols = 6, 4

a = 10*np.random.random_sample((row , cols))
a[0:3,0:2] = np.nan
a[0,2] = np.nan

factor_x = 2
factor_y = 2
a_misc = misc.imresize(a, .5, interp='nearest', mode='F')
a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y))

print a
print
print a_misc
print
print a_2d_nonan

plt.subplot(131)
plt.imshow(a,interpolation='nearest')
plt.title('original')
plt.xticks(arange(a.shape[1]))
plt.yticks(arange(a.shape[0]))
plt.subplot(132)
plt.imshow(a_misc,interpolation='nearest')
plt.title('scipy.misc')
plt.xticks(arange(a_misc.shape[1]))
plt.yticks(arange(a_misc.shape[0]))
plt.subplot(133)
plt.imshow(a_2d_nonan,interpolation='nearest')
plt.title('my.func')
plt.xticks(arange(a_2d_nonan.shape[1]))
plt.yticks(arange(a_2d_nonan.shape[0]))

编辑

我对地址做了一些修改

如果我用其他值替换NaN，比如说非NaN像素的平均值，它将影响所有后续计算：重新采样的原始数组和使用NaN替换的重新采样数组之间的差异表明2个像素改变了它们的值

我的目标是跳过所有NaN像素

# substitute NaN with the average value 

ind_nonan , ind_nan = np.where(np.isnan(a) == False), np.where(np.isnan(a) == True)
a_substitute = np.copy(a)

a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute the NaN with average on the not-Nan

a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='F')
a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y))

print a_2d_nonan-a_substitute_2d_nonan

[[        nan -0.02296697]
 [ 0.23143208  0.        ]
 [ 0.          0.        ]]

**第二版**

为了解决问题的答案，我添加了一些额外的代码。这是一个很好的想法，可悲的是，它在应该是“空”（NaN）的像素上插入新值，在我的小示例中，生成的NaN比好值多

X , Y  = np.indices((row , cols))
X_new , Y_new  = np.indices((row/factor_x , cols/factor_y))

from scipy.interpolate import CloughTocher2DInterpolator as intp
C = intp((X[ind_nonan],Y[ind_nonan]),a[ind_nonan])

a_interp = C(X_new , Y_new)

print a
print
print a_interp

[[        nan,         nan],
 [        nan,         nan],
 [        nan,  6.32826577]])

在不同的网格上使用插值点。下面我展示了a，速度较慢，但可能更准确。您会注意到，此函数缺少角点像素，然后可以使用或插值来处理这些最后的值

import numpy as np
import pylab as plt

# Test data
row = np.linspace(-3,3,50)
X,Y = np.meshgrid(row,row)
Z = np.sqrt(X**2+Y**2) + np.cos(Y) 

# Make some dead pixels, favor an edge
dead = np.random.random(Z.shape)
dead = (dead*X>.7)
Z[dead] =np.nan

from scipy.interpolate import CloughTocher2DInterpolator as intp
C = intp((X[~dead],Y[~dead]),Z[~dead])

new_row = np.linspace(-3,3,25)
xi,yi   = np.meshgrid(new_row,new_row)
zi = C(xi,yi)

plt.subplot(121)
plt.title("Original signal 50x50")
plt.imshow(Z,interpolation='nearest')

plt.subplot(122)
plt.title("Interpolated signal 25x25")
plt.imshow(zi,interpolation='nearest')

plt.show()

您正在阵列的小窗口上操作。不必在阵列中循环以形成窗口，而可以通过操纵其步幅来有效地重构阵列。numpy库提供了

as_stried（）

函数来帮助实现这一点。SciPy CookBook中提供了一个示例

下面将使用一个广义的滑动窗口函数，我将在最后包括它

确定新阵列的形状：

rows, cols = a.shape
new_shape = rows / 2, cols / 2

将数组重新构造为所需的窗口，并创建一个标识NAN的索引数组：

# 2x2 windows of the original array
windows = sliding_window(a, (2,2))
# make a windowed boolean array for indexing
notNan = sliding_window(np.logical_not(np.isnan(a)), (2,2))

可以使用列表理解或生成器表达式创建新数组

# using a list comprehension
# make a list of the means of the windows, disregarding the Nan's
means = [window[index].mean() for window, index in zip(windows, notNan)]
new_array = np.array(means).reshape(new_shape)

# generator expression
# produces the means of the windows, disregarding the Nan's
means = (window[index].mean() for window, index in zip(windows, notNan))
new_array = np.fromiter(means, dtype = np.float32).reshape(new_shape)

生成器表达式应保留内存。如果内存有问题，使用

itertools.izip（）

代替“zip”也会有所帮助。我只是用列表理解你的答案

您的功能：

def resize_2d_nonan(array,factor):
    """
    Resize a 2D array by different factor on two axis skipping NaN values.
    If a new pixel contains only NaN, it will be set to NaN

    Parameters
    ----------
    array : 2D np array

    factor : int or tuple. If int x and y factor wil be the same

    Returns
    -------
    array : 2D np array scaled by factor

    Created on Mon Jan 27 15:21:25 2014

    @author: damo_ma
    """
    xsize, ysize = array.shape

    if isinstance(factor,int):
        factor_x = factor
        factor_y = factor
        window_size = factor, factor
    elif isinstance(factor,tuple):
        factor_x , factor_y = factor
        window_size = factor
    else:
        raise NameError('Factor must be a tuple (x,y) or an integer')

    if (xsize % factor_x or ysize % factor_y) :
        raise NameError('Factors must be integer multiple of array shape')

    new_shape = xsize / factor_x, ysize / factor_y

    # non-overlapping windows of the original array
    windows = sliding_window(a, window_size)
    # windowed boolean array for indexing
    notNan = sliding_window(np.logical_not(np.isnan(a)), window_size)

    #list of the means of the windows, disregarding the Nan's
    means = [window[index].mean() for window, index in zip(windows, notNan)]
    # new array
    new_array = np.array(means).reshape(new_shape)

    return new_array

我没有和你的原始函数做任何时间比较，但是它应该更快

我在这里看到的许多解决方案都将操作矢量化，以提高速度/效率-我对此不太了解，也不知道它是否可以应用于您的问题。搜索窗口、数组、移动平均、矢量化和numpy应该会产生类似的问题和答案供参考

滑动窗口（） import numpy as np from numpy.lib.stride_tricks import as_strided as ast from itertools import product def norm_shape(shape): ''' Normalize numpy array shapes so they're always expressed as a tuple, even for one-dimensional shapes. Parameters shape - an int, or a tuple of ints Returns a shape tuple ''' try: i = int(shape) return (i,) except TypeError: # shape was not a number pass try: t = tuple(shape) return t except TypeError: # shape was not iterable pass raise TypeError('shape must be an int, or a tuple of ints') def sliding_window(a,ws,ss = None,flatten = True): ''' Return a sliding window over a in any number of dimensions Parameters: a - an n-dimensional numpy array ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size of each dimension of the window ss - an int (a is 1D) or tuple (a is 2D or greater) representing the amount to slide the window in each dimension. If not specified, it defaults to ws. flatten - if True, all slices are flattened, otherwise, there is an extra dimension for each dimension of the input. Returns an array containing each n-dimensional window from a ''' if None is ss: # ss was not provided. the windows will not overlap in any direction. ss = ws ws = norm_shape(ws) ss = norm_shape(ss) # convert ws, ss, and a.shape to numpy arrays so that we can do math in every # dimension at once. ws = np.array(ws) ss = np.array(ss) shape = np.array(a.shape) # ensure that ws, ss, and a.shape all have the same number of dimensions ls = [len(shape),len(ws),len(ss)] if 1 != len(set(ls)): raise ValueError(\ 'a.shape, ws and ss must all have the same length. They were %s' % str(ls)) # ensure that ws is smaller than a in every dimension if np.any(ws > shape): raise ValueError(\ 'ws cannot be larger than a in any dimension.\ a.shape was %s and ws was %s' % (str(a.shape),str(ws))) # how many slices will there be in each dimension? newshape = norm_shape(((shape - ws) // ss) + 1) # the shape of the strided array will be the number of slices in each dimension # plus the shape of the window (tuple addition) newshape += norm_shape(ws) # the strides tuple will be the array's strides multiplied by step size, plus # the array's strides (tuple addition) newstrides = norm_shape(np.array(a.strides) * ss) + a.strides strided = ast(a,shape = newshape,strides = newstrides) if not flatten: return strided # Collapse strided so that it has one more dimension than the window. I.e., # the new array is a flat list of slices. meat = len(ws) if ws.shape else 0 firstdim = (np.product(newshape[:-meat]),) if ws.shape else () dim = firstdim + (newshape[-meat:]) # remove any dimensions with size 1 dim = filter(lambda i : i != 1,dim) return strided.reshape(dim) 滑动窗口（）属性我最初是在一个博客页面上发现的，现在是一个断开的链接：使用Numpy- 通过一点搜索，它看起来像是现在驻留在。谢谢约翰·维亚德请注意，这篇文章非常古老，有很多关于滑动窗口、滚动窗口和图像-面片提取的问答。有很多一次性使用numpy的功能，但这个功能似乎仍然是唯一一个处理n-d窗口的功能。scikits sklearn.feature\u extraction.image library似乎经常被引用用于提取或查看图像修补程序。谢谢，但这在我的小示例中不起作用，而且我不希望在NaN像素上有插值：如果新阵列中的像素来自原始矩阵的NaN子集，则必须生成NaN。我编辑这个问题是为了澄清。@kidpixo如果在调整大小时，一个块同时包含nan像素和活动像素，你会怎么做？是南部总是赢，还是他们只是在某个临界百分比上踢进？在这个时刻，南部总是输。在小尺寸块中，我只对有效值进行操作，跳过所有NaN。如果我只有NaN，这个区块的结果可能是NaN。我知道这并不完美，但对我来说，块中的有效值是我们对这个特定块中的值的最佳猜测。我不想混合不同的块值，这将意味着在几个像素上涂抹有效值。谢谢你的尝试！对于一个有一个Nan的2x2单元格/窗口，您是否期望其他三个单元格/窗口的平均值？如果单元格/窗口中的所有值都是Nan，您期望该单元格的值是多少？这是到目前为止我读到的最佳答案。我做了一些测试：我的函数（用于循环）：-（6，4）图像>1000个循环，每个循环3:636µs的最佳值-（720，1440）图像>1个循环，每个循环3:20.9 s的最佳值你的mod（跨步技巧）：-（6，4）图像>1000个循环，每个循环3:422µs的最佳值-（720，1440）图像>1个循环，在更大的图像上每个循环3:9.24 s的最佳值是如此