Python 通过从邻居中随机选择一个zipcodes来填充缺失的zipcodes
我有一个如下所示的pandas数据框,我试图通过从相似的邻域组中选择任意随机值来替换zipcode字段中缺少的值。下面是我的尝试,但效果不太好。请帮忙Python 通过从邻居中随机选择一个zipcodes来填充缺失的zipcodes,python,pandas,group-by,data-cleaning,Python,Pandas,Group By,Data Cleaning,我有一个如下所示的pandas数据框,我试图通过从相似的邻域组中选择任意随机值来替换zipcode字段中缺少的值。下面是我的尝试,但效果不太好。请帮忙 zipcodes = a_df[['neighbourhood_group_cleansed','zipcode']].drop_duplicates().reset_index() a_df['zipcode'] = a_df.apply(lambda row: np.random.choice(zipcodes[zipcodes['neigh
zipcodes = a_df[['neighbourhood_group_cleansed','zipcode']].drop_duplicates().reset_index()
a_df['zipcode'] = a_df.apply(lambda row: np.random.choice(zipcodes[zipcodes['neighbourhood_group_cleansed'] ==
row['neighbourhood_group_cleansed']]['zipcode']) if len(row.zipcode) == 0 else row.zipcode, axis = 1)
state city smart_location neighbourhood_group_cleansed zipcode
0 NY New York New York, NY Manhattan 10029
1 NY Brooklyn Brooklyn, NY Brooklyn 11221
2 NY Brooklyn Brooklyn, NY Brooklyn 11206
3 NY New York New York, NY Manhattan 10001
4 NY New York New York, NY Manhattan 10162
... ... ... ... ... ...
6492 NY New York New York, NY Manhattan 10004.0
6493 NY Brooklyn Brooklyn, NY Brooklyn 11229.0
6494 NY Queens Queens, NY Queens 11691.0
6495 NY New York New York, NY Manhattan 10044.0
6496 NY Brooklyn Brooklyn, NY Brooklyn 11234.0
这应该行得通
df['zipcode'] = df.apply(lambda x: random.choice(df[df['neighbourhood_group_cleansed'] == x['neighbourhood_group_cleansed']].zipcode.dropna().values) if np.isnan(x['zipcode']) else x['zipcode'], axis=1)
你能告诉我你犯了什么错误吗?谢谢,这很有帮助。