Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 随机删除numpy数组中30%的值_Python_Arrays_Numpy - Fatal编程技术网

Python 随机删除numpy数组中30%的值

Python 随机删除numpy数组中30%的值,python,arrays,numpy,Python,Arrays,Numpy,我有一个2D numpy数组,其中包含我的值(其中一些可以是NaN)。我想删除30%的非NaN值,并用数组的平均值替换它们。我怎样才能做到?到目前为止,我尝试的是: def spar_removal(array, mean_value, sparseness): array1 = deepcopy(array) array2 = array1 spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness)

我有一个2D numpy数组,其中包含我的值(其中一些可以是NaN)。我想删除30%的非NaN值,并用数组的平均值替换它们。我怎样才能做到?到目前为止,我尝试的是:

def spar_removal(array, mean_value, sparseness):
    array1 = deepcopy(array)
    array2 = array1
    spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
    for i in range (0, spar_size):
        index = np.random.choice(np.where(array2 != mean_value)[1])
        array2[0, index] = mean_value
    return array2

但这只是拾取数组中的同一行。如何从整个阵列中删除?看来选择只适用于一个维度。我想我想要的是计算
(x,y)
对,我将用
平均值
替换它的值,可能有更好的方法,但是考虑一下:

import numpy as np

x = np.array([[1,2,3,4],
              [1,2,3,4],
              [np.NaN, np.NaN, np.NaN, np.NaN],
              [1,2,3,4]])

# Get a vector of 1-d indexed indexes of non NaN elements
indices = np.where(np.isfinite(x).ravel())[0]

# Shuffle the indices, select the first 30% (rounded down with int())
to_replace = np.random.permutation(indices)[:int(indices.size * 0.3)]

# Replace those indices with the mean (ignoring NaNs)
x[np.unravel_index(to_replace, x.shape)] = np.nanmean(x)

print(x)
示例输出

[[ 2.5 2. 2.5 4. ] [ 1. 2. 3. 4. ] [ nan nan nan nan] [ 2.5 2. 3. 4. ]] [[ 2.5 2. 2.5 4. ] [ 1. 2. 3. 4. ] [楠楠] [ 2.5 2. 3. 4. ]]
NaN永远不会改变,并且地板(0.3*非NaN元素的数量)将设置为平均值(忽略NaN的平均值)。

可能有更好的方法,但请考虑:

import numpy as np

x = np.array([[1,2,3,4],
              [1,2,3,4],
              [np.NaN, np.NaN, np.NaN, np.NaN],
              [1,2,3,4]])

# Get a vector of 1-d indexed indexes of non NaN elements
indices = np.where(np.isfinite(x).ravel())[0]

# Shuffle the indices, select the first 30% (rounded down with int())
to_replace = np.random.permutation(indices)[:int(indices.size * 0.3)]

# Replace those indices with the mean (ignoring NaNs)
x[np.unravel_index(to_replace, x.shape)] = np.nanmean(x)

print(x)
示例输出

[[ 2.5 2. 2.5 4. ] [ 1. 2. 3. 4. ] [ nan nan nan nan] [ 2.5 2. 3. 4. ]] [[ 2.5 2. 2.5 4. ] [ 1. 2. 3. 4. ] [楠楠] [ 2.5 2. 3. 4. ]]
NaN永远不会改变,而floor(0.3*非NaN元素数)将设置为平均值(忽略NaN的平均值)。

因为返回的两个数组包含索引,这就是您想要的:

def spar_removal(array, mean_value, sparseness):

    array1 = copy.deepcopy(array)
    array2 = array1
    spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
    # This is used to filtered out nan
    indexs = np.where(array2==array2)
    indexsL = len(indexs[0])

    for i in np.random.choice(indexsL,spar_size,replace=False):
        indexX = indexs[0][i]
        indexY = indexs[1][i]
        array2[indexX,indexY] = mean_value

return array2

因为where returns two数组包含索引,所以这就是您想要的:

def spar_removal(array, mean_value, sparseness):

    array1 = copy.deepcopy(array)
    array2 = array1
    spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
    # This is used to filtered out nan
    indexs = np.where(array2==array2)
    indexsL = len(indexs[0])

    for i in np.random.choice(indexsL,spar_size,replace=False):
        indexX = indexs[0][i]
        indexY = indexs[1][i]
        array2[indexX,indexY] = mean_value

return array2

它需要正好是非NaN值的30%,还是每个非NaN值都需要30%的替换机会?例如,如果我们有100个非NaN值,您是否需要恰好替换其中的30个,或者您是否同意每个值都有30%的替换几率,这样有时您可以替换27个,而很少替换45个?是的,需要删除30%的非NaN值
删除
替换
之间存在差异。Remove至少意味着将数组的形状减小,例如从(100100)减小到(90,90)或某个这样的值。虽然很容易删除整行或整列,但在不使数组参差不齐的情况下删除单个元素是很困难的。它需要正好是非NaN值的30%,还是每个非NaN值都需要30%的替换几率?例如,如果我们有100个非NaN值,您是否需要恰好替换其中的30个,或者您是否同意每个值都有30%的替换几率,这样有时您可以替换27个,而很少替换45个?是的,需要删除30%的非NaN值
删除
替换
之间存在差异。Remove至少意味着将数组的形状减小,例如从(100100)减小到(90,90)或某个这样的值。虽然很容易删除整行或整列,但要删除单个元素而不使数组参差不齐是很困难的。