Python Pandas-使用相应的id列值填充缺少的列值
我希望用JSON文件中相应的代码键来填充缺少的列值,该文件基于下面的代码抛出TypeError:“list”对象不可调用。下面是我读取和填充缺失值的代码Python Pandas-使用相应的id列值填充缺少的列值,python,json,pandas,missing-data,Python,Json,Pandas,Missing Data,我希望用JSON文件中相应的代码键来填充缺少的列值,该文件基于下面的代码抛出TypeError:“list”对象不可调用。下面是我读取和填充缺失值的代码 data = json.load((open('world_bank_projects.json'))) themecodes = json_normalize(data, 'mjtheme_namecode') d = themecodes.sort_values('name', na_position='last').set_in
data = json.load((open('world_bank_projects.json')))
themecodes = json_normalize(data, 'mjtheme_namecode')
d = themecodes.sort_values('name', na_position='last').set_index('code')['name'].to_dict()
themecodes.loc[themecodes['name'].isnull(), 'name'] = themecodes['code'].map(d)
themecodes.head(20)
code name
0 8 Human development
1 11
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
10 5 Trade and integration
11 4 Financial and private sector development
12 6 Social protection and risk management
13 6
14 2 Public sector governance
15 4 Financial and private sector development
16 11 Environment and natural resources management
17 8
18 10 Rural development
19 7
我认为如果空值为Nones或NaNs,则需要: 或: 解决方案(如果需要)替换空白或某些空白:
d = themecodes.sort_values('name', na_position='first').set_index('code')['name'].to_dict()
themecodes.loc[themecodes['name'].str.strip() == '', 'name'] = themecodes['code'].map(d)
我认为如果空值为Nones或NaNs,则需要: 或: 解决方案(如果需要)替换空白或某些空白:
d = themecodes.sort_values('name', na_position='first').set_index('code')['name'].to_dict()
themecodes.loc[themecodes['name'].str.strip() == '', 'name'] = themecodes['code'].map(d)
d保存代码和名称的排序列表,但使用map都不能填充缺少的名称值。…。@Mr.Jibz-这是字典。你能解释一下为什么它不起作用吗?我认为使用d作为排序字典是干净的,理想情况下,我会用dIt中的对应键来填充代码中缺少的名称列值。看起来你的空值不是NaN或Nones,所以添加了解决方案。d保存代码和名称的排序列表,但map的使用都不是在填充缺少的名称值。…。@Mr.Jibz-它是字典。你能解释一下为什么它不起作用吗?我认为使用d作为排序字典是干净的,理想情况下,我会用dIt中的对应键来填充代码中缺少的名称列值。看起来你的空值不是NaN或Nones,所以为它添加了解决方案。
themecodes['name'] = (themecodes.sort_values('name', na_position='last')
.groupby('code')['name']
.transform(lambda x: x.fillna(x.iat[0]))
.sort_index())
print (themecodes)
code name
0 8 Human development
1 11 Environment and natural resources management
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
10 5 Trade and integration
11 4 Financial and private sector development
12 6 Social protection and risk management
13 6 Social protection and risk management
14 2 Public sector governance
15 4 Financial and private sector development
16 11 Environment and natural resources management
17 8 Human development
18 10 Rural development
19 7 Social dev/gender/inclusion
d = themecodes.sort_values('name', na_position='first').set_index('code')['name'].to_dict()
themecodes.loc[themecodes['name'].str.strip() == '', 'name'] = themecodes['code'].map(d)