Python 是否有更好的转换方法';对象';通过替换';na';你的意思是什么?
我有一个字符串数组,其中包含一些元素,例如'na',无法通过使用Python 是否有更好的转换方法';对象';通过替换';na';你的意思是什么?,python,arrays,string,numpy,Python,Arrays,String,Numpy,我有一个字符串数组,其中包含一些元素,例如'na',无法通过使用x.astype(np.float)将其转换为float,如给定 请提出比我做的更好的方法。请查找下面的过程(这是我的jupyter笔记本中的一个片段,我展示了中间步骤以演示更改): 在[4]:val_inc 出[4]: array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382', '39.6416', '38.9177', '36.9031',
x.astype(np.float)
将其转换为float,如给定
请提出比我做的更好的方法。请查找下面的过程(这是我的jupyter笔记本中的一个片段,我展示了中间步骤以演示更改):
在[4]:val_inc
出[4]:
array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
'39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
'37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
'38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
'40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
'41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
'38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
在[5]:val_inc[val_inc='na']='0'
在[6]中:val_inc
出[6]:
array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
'39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
'37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
'38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
'40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
'41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
'38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
在[7]中:val_inc=val_inc.astype(np.float)
在[8]:val_inc
出[8]:
array([ 0. , 38.012 , 38.7816, 38.0736, 40.7118, 44.7382,
39.6416, 38.9177, 36.9031, 43.2611, 38.2732, 40.7129,
37.2844, 39.5835, 43.9194, 42.5485, 36.9052, 0. ,
41.9264, 45.3568, 44.6239, 38.1079, 45.2393, 32.785 ,
44.6239, 38.0216, 38.4608, 42.5644, 35.3127, 33.2936,
33.0556, 40.4476, 35.6581, 35.5574, 43.1096, 34.4751,
42.0554, 40.3944, 40.2466, 32.2567, 0. , 38.8594,
43.947 , 41.7973, 41.8105, 40.3797, 31.2868, 45.3644,
40.7177, 41.8558, 38.9249, 33.2077, 42.4053, 42.559 ])
在[9]中:np.平均值(val_inc[val_inc!=0.])
Out[9]:39.587374509803915
在[10]中:val_inc[val_inc==0.]=np.均值(val_inc[val_inc!=0.])
在[11]中:val_inc
出[11]:
array([ 39.58737451, 38.012 , 38.7816 , 38.0736 ,
40.7118 , 44.7382 , 39.6416 , 38.9177 ,
36.9031 , 43.2611 , 38.2732 , 40.7129 ,
37.2844 , 39.5835 , 43.9194 , 42.5485 ,
36.9052 , 39.58737451, 41.9264 , 45.3568 ,
44.6239 , 38.1079 , 45.2393 , 32.785 ,
44.6239 , 38.0216 , 38.4608 , 42.5644 ,
35.3127 , 33.2936 , 33.0556 , 40.4476 ,
35.6581 , 35.5574 , 43.1096 , 34.4751 ,
42.0554 , 40.3944 , 40.2466 , 32.2567 ,
39.58737451, 38.8594 , 43.947 , 41.7973 ,
41.8105 , 40.3797 , 31.2868 , 45.3644 ,
40.7177 , 41.8558 , 38.9249 , 33.2077 ,
42.4053 , 42.559 ])
将
'na'
替换为'nan'
,然后将其转换为np.nan
,然后使用np.nanmean
例如:
test = np.array(['0','1','nan'], dtype=float)
np.where(np.isnan(test), np.nanmean(test), test)
array([ 0. , 1. , 0.5])
最好先将“na”转换为正确的NaN。然后,用户可以按照自己的意愿使用数据:
import numpy as np
val_inc[val_inc == 'na'] = np.nan # 'na' to proper NaN or missing value
val_inc = val_inc.astype(np.float) # no error here now.
print(val_inc)
输出:
[ nan 38.012 38.7816 38.0736 40.7118 44.7382 39.6416 38.9177
36.9031 43.2611 38.2732 40.7129 37.2844 39.5835 43.9194 42.5485
36.9052 nan 41.9264 45.3568 44.6239 38.1079 45.2393 32.785
44.6239 38.0216 38.4608 42.5644 35.3127 33.2936 33.0556 40.4476
35.6581 35.5574 43.1096 34.4751 42.0554 40.3944 40.2466 32.2567
nan 38.8594 43.947 41.7973 41.8105 40.3797 31.2868 45.3644
40.7177 41.8558 38.9249 33.2077 42.4053 42.559 ]
将
'na'
替换为'nan'
,它将转换为浮点。@kazemakase感谢您的建议。我不知道字符串“nan”可以直接转换为np。nan抱歉,我的问题是重复的,我会努力提高我的搜索技能。无需道歉。。。相反,如果你的问题被标记为重复,那么你的问题现在就成为了其他可能正在寻找与你相同的搜索词的人的标志。在其他建议中,你的建议是解决我问题的最快方法。谢谢