Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 替换大型数组数据集中的所有nan值_Python_Arrays_Numpy_Multidimensional Array - Fatal编程技术网

Python 替换大型数组数据集中的所有nan值

Python 替换大型数组数据集中的所有nan值,python,arrays,numpy,multidimensional-array,Python,Arrays,Numpy,Multidimensional Array,我在一个非常大的数组数据集上拟合一个神经网络模型(自动编码器),每个嵌套数组都有形状(1100,4) 从第一个纪元开始,我就损失了nan,损失/价值损失: Epoch 1/50 511948/511948 [==============================] - 267s 522us/step - loss: nan - acc: 0.5239 - val_loss: nan - val_acc: 0.5235 Epoch 2/50 511948/511948 [==========

我在一个非常大的数组数据集上拟合一个神经网络模型(自动编码器),每个嵌套数组都有形状
(1100,4)

从第一个纪元开始,我就损失了
nan
,损失/价值损失:

Epoch 1/50
511948/511948 [==============================] - 267s 522us/step - loss: nan - acc: 0.5239 - val_loss: nan - val_acc: 0.5235
Epoch 2/50
511948/511948 [==============================] - 272s 530us/step - loss: nan - acc: 0.5234 - val_loss: nan - val_acc: 0.5233
更改了所有超参数值(优化器、学习率等),但没有出现相同的问题。在进一步检查数据集时,我了解到存在nan值,可能是nan丢失的原因:

if np.isnan(Train_X).any():
  print(Train_X)

[[[[ 5.66440628e-03 -1.11057350e-02  5.35699731e-03  1.42108547e-14]
   [ 4.05186182e-03 -4.71546882e-03 -1.57709147e-03  9.35064891e+01]
   [ 3.92575255e-03 -1.45019307e-03 -7.44808370e-04  1.87012978e+02]
   ...
   [ 5.88266444e-03 -7.59219123e-03  2.22257658e-03  8.46522144e-06]
   [ 8.78427479e-04 -9.54657321e-04  2.68735736e-04  3.63856117e-06]
   [ 4.57741540e-04  0.00000000e+00  2.89454575e-03  4.30687537e-06]]]


 [[[ 5.81100709e+00 -6.76592913e-01 -1.31451089e+00  2.66544929e-04]
   [ 6.05009120e+00 -6.07611268e-03 -8.90299844e-01  5.74642441e-04]
   [ 6.40465738e+00  1.82869833e-01  6.22291158e-02  1.03689017e-03]
   ...
   [ 4.96069986e+00  1.04734007e-01 -2.17030850e-01  7.26117358e-05]
   [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]
   [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]]]

 [[[            nan             nan             nan  0.00000000e+00]
   [            nan             nan             nan  0.00000000e+00]
   [            nan             nan             nan -1.50999068e-05]
   ...
   [ 5.62468522e-03  4.27860671e-03 -2.06719201e-03  0.00000000e+00]
   [ 1.11051478e-02  3.74979015e-03  1.34607852e-03  0.00000000e+00]
   [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]]]]
我还可以通过
列车X
的第一个条目确认这一点:

Train_X[0]
array([[[ 5.66440628e-03, -1.11057350e-02,  5.35699731e-03,
          1.42108547e-14],
        [ 4.05186182e-03, -4.71546882e-03, -1.57709147e-03,
          9.35064891e+01],
        ...
        [ 7.10669020e-02,  4.91383899e-03, -1.43700407e-02,
          1.52228864e-04],
        [ 7.59807410e-02, -9.45620170e-03,             nan,
          1.35892100e-04],
        [ 6.65245393e-02,             nan,             nan,
          8.98521456e-05],
        [            nan,             nan,             nan,
          1.41090006e-05],
        [            nan,             nan,             nan,
          6.68319391e-06],
        [            nan,             nan,             nan,
         -3.27272689e+01],
        [            nan,             nan,             nan,
         -1.09090911e+01],
        [            nan,             nan,             nan,
          8.25973981e+01],
        [            nan,             nan,             nan,
          1.12207785e+02],
        [            nan,             nan,             nan,
          1.65194797e+02],
        [            nan,             nan,             nan,
          2.25974015e+02],
        [            nan,             nan,             nan,
          2.78961026e+02],
        [ 3.87926649e-03,  1.81274134e-04, -1.08764481e-03,
          3.41298685e+02]]])
我想要一种方法来检查存在
nan
的所有值,并将其替换为列的平均值或中位数。如果整列恰好都是
0s
nan
,我想从Train_X中删除该特定数组。这样我就可以向网络提供不包含任何
nan
的数据集,并查看损失是否从当前状态发生变化


我该怎么做?

您可以使用
np.isnan
np.nanmean
和索引,第二个
x[np.isnan(x)]
是将所有
nan
列设置为零

x = np.random.randint(0,100,[2,1,4,4]).astype(float)
x[0][0][[0,1,3],[1,2,2]] = float('nan')
x[1][0][[0,1,3],[1,3,2]] = float('nan')
x[0,0,:,1] = float('nan')
x
array([[[[58., nan, 43., 56.],
         [88., nan, nan, 69.],
         [ 2., nan, 56., 21.],
         [65., nan, nan, 23.]]],


       [[[96., nan, 86., 19.],
         [33., 69., 83., nan],
         [93., 21.,  7.,  2.],
         [49., 21., nan, 84.]]]])
x.shape
(2, 1, 4, 4)
columnMean =  np.nanmean(x,axis=2) #get the mean value for each column
idc = np.where(np.isnan(x)) # get the indices of nan values
x[np.isnan(x)] = columnMean[idc[0],idc[1],idc[3]] # set nan values to corresponding mean
x[np.isnan(x)] = 0 # set nan columns to zero
x
array([[[[58.        ,  0.        , 43.        , 56.        ],
         [88.        ,  0.        , 49.5       , 69.        ],
         [ 2.        ,  0.        , 56.        , 21.        ],
         [65.        ,  0.        , 49.5       , 23.        ]]],


       [[[96.        , 37.        , 86.        , 19.        ],
         [33.        , 69.        , 83.        , 35.        ],
         [93.        , 21.        ,  7.        ,  2.        ],
         [49.        , 21.        , 58.66666667, 84.        ]]]])

columnMean=np.nanmean(x,axis=2)
给出错误:
/usr/local/lib/python3.6/dist packages/ipykernel\u launcher.py:1:RuntimeWarning:Mean of empty slice”“”启动IPython内核的入口点。
AH,它工作了,将
np.nanmean(x,axis=2)
更改为
np.nanmean(x,axis=3)
@arilwan我认为如果一列中的所有值都是
nan
,则会出现“警告”,选择
axis=3
将计算最后一个轴的平均值,因此选择哪个轴取平均值是优先考虑的问题
x = np.random.randint(0,100,[2,1,4,4]).astype(float)
x[0][0][[0,1,3],[1,2,2]] = float('nan')
x[1][0][[0,1,3],[1,3,2]] = float('nan')
x[0,0,:,1] = float('nan')
x
array([[[[58., nan, 43., 56.],
         [88., nan, nan, 69.],
         [ 2., nan, 56., 21.],
         [65., nan, nan, 23.]]],


       [[[96., nan, 86., 19.],
         [33., 69., 83., nan],
         [93., 21.,  7.,  2.],
         [49., 21., nan, 84.]]]])
x.shape
(2, 1, 4, 4)
columnMean =  np.nanmean(x,axis=2) #get the mean value for each column
idc = np.where(np.isnan(x)) # get the indices of nan values
x[np.isnan(x)] = columnMean[idc[0],idc[1],idc[3]] # set nan values to corresponding mean
x[np.isnan(x)] = 0 # set nan columns to zero
x
array([[[[58.        ,  0.        , 43.        , 56.        ],
         [88.        ,  0.        , 49.5       , 69.        ],
         [ 2.        ,  0.        , 56.        , 21.        ],
         [65.        ,  0.        , 49.5       , 23.        ]]],


       [[[96.        , 37.        , 86.        , 19.        ],
         [33.        , 69.        , 83.        , 35.        ],
         [93.        , 21.        ,  7.        ,  2.        ],
         [49.        , 21.        , 58.66666667, 84.        ]]]])