Python 是否有更好的转换方法';对象';通过替换';na';你的意思是什么?

Python 是否有更好的转换方法';对象';通过替换';na';你的意思是什么?,python,arrays,string,numpy,Python,Arrays,String,Numpy,我有一个字符串数组,其中包含一些元素,例如'na',无法通过使用x.astype(np.float)将其转换为float,如给定 请提出比我做的更好的方法。请查找下面的过程(这是我的jupyter笔记本中的一个片段,我展示了中间步骤以演示更改): 在[4]:val_inc 出[4]: array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382', '39.6416', '38.9177', '36.9031',

我有一个字符串数组,其中包含一些元素,例如'na',无法通过使用
x.astype(np.float)
将其转换为float,如给定

请提出比我做的更好的方法。请查找下面的过程(这是我的jupyter笔记本中的一个片段,我展示了中间步骤以演示更改):

在[4]:
val_inc

出[4]:

array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
在[5]:
val_inc[val_inc='na']='0'

在[6]中:
val_inc

出[6]:

array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
       '39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
       '37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
       45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
       '38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
       '40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
       40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
       '41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
       '38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
在[7]中:
val_inc=val_inc.astype(np.float)

在[8]:
val_inc

出[8]:

array([  0.    ,  38.012 ,  38.7816,  38.0736,  40.7118,  44.7382,
        39.6416,  38.9177,  36.9031,  43.2611,  38.2732,  40.7129,
        37.2844,  39.5835,  43.9194,  42.5485,  36.9052,   0.    ,
        41.9264,  45.3568,  44.6239,  38.1079,  45.2393,  32.785 ,
        44.6239,  38.0216,  38.4608,  42.5644,  35.3127,  33.2936,
        33.0556,  40.4476,  35.6581,  35.5574,  43.1096,  34.4751,
        42.0554,  40.3944,  40.2466,  32.2567,   0.    ,  38.8594,
        43.947 ,  41.7973,  41.8105,  40.3797,  31.2868,  45.3644,
        40.7177,  41.8558,  38.9249,  33.2077,  42.4053,  42.559 ])
在[9]中:
np.平均值(val_inc[val_inc!=0.])

Out[9]:
39.587374509803915

在[10]中:
val_inc[val_inc==0.]=np.均值(val_inc[val_inc!=0.])

在[11]中:
val_inc

出[11]:

array([ 39.58737451,  38.012     ,  38.7816    ,  38.0736    ,
        40.7118    ,  44.7382    ,  39.6416    ,  38.9177    ,
        36.9031    ,  43.2611    ,  38.2732    ,  40.7129    ,
        37.2844    ,  39.5835    ,  43.9194    ,  42.5485    ,
        36.9052    ,  39.58737451,  41.9264    ,  45.3568    ,
        44.6239    ,  38.1079    ,  45.2393    ,  32.785     ,
        44.6239    ,  38.0216    ,  38.4608    ,  42.5644    ,
        35.3127    ,  33.2936    ,  33.0556    ,  40.4476    ,
        35.6581    ,  35.5574    ,  43.1096    ,  34.4751    ,
        42.0554    ,  40.3944    ,  40.2466    ,  32.2567    ,
        39.58737451,  38.8594    ,  43.947     ,  41.7973    ,
        41.8105    ,  40.3797    ,  31.2868    ,  45.3644    ,
        40.7177    ,  41.8558    ,  38.9249    ,  33.2077    ,
        42.4053    ,  42.559     ])

'na'
替换为
'nan'
,然后将其转换为
np.nan
,然后使用
np.nanmean

例如:

test = np.array(['0','1','nan'], dtype=float)
np.where(np.isnan(test), np.nanmean(test), test)

array([ 0. ,  1. ,  0.5])

最好先将“na”转换为正确的NaN。然后,用户可以按照自己的意愿使用数据:

import numpy as np
val_inc[val_inc == 'na'] = np.nan   # 'na' to proper NaN or missing value
val_inc = val_inc.astype(np.float)  # no error here now.
print(val_inc)
输出:

[     nan  38.012   38.7816  38.0736  40.7118  44.7382  39.6416  38.9177
  36.9031  43.2611  38.2732  40.7129  37.2844  39.5835  43.9194  42.5485
  36.9052      nan  41.9264  45.3568  44.6239  38.1079  45.2393  32.785
  44.6239  38.0216  38.4608  42.5644  35.3127  33.2936  33.0556  40.4476
  35.6581  35.5574  43.1096  34.4751  42.0554  40.3944  40.2466  32.2567
      nan  38.8594  43.947   41.7973  41.8105  40.3797  31.2868  45.3644
  40.7177  41.8558  38.9249  33.2077  42.4053  42.559 ]

'na'
替换为
'nan'
,它将转换为浮点。@kazemakase感谢您的建议。我不知道字符串“nan”可以直接转换为np。nan抱歉,我的问题是重复的,我会努力提高我的搜索技能。无需道歉。。。相反,如果你的问题被标记为重复,那么你的问题现在就成为了其他可能正在寻找与你相同的搜索词的人的标志。在其他建议中,你的建议是解决我问题的最快方法。谢谢