Python 基于另一列的Panda/numpy映射列值
我有这样的输入Python 基于另一列的Panda/numpy映射列值,python,numpy,Python,Numpy,我有这样的输入 zip state 95648 CA 95683 CA 95648 NaN 95648 CA 95649 CA 我想通过从zip减少来填充状态值。 输出应为: zip state 95648 CA 95683 CA 95648 **CA** 95648 CA 95649 CA 目前,我已经尝试过这样做: 1. creating a map 2. take a copy of zip column as zip1. 3. r
zip state
95648 CA
95683 CA
95648 NaN
95648 CA
95649 CA
我想通过从zip减少来填充状态值。
输出应为:
zip state
95648 CA
95683 CA
95648 **CA**
95648 CA
95649 CA
目前,我已经尝试过这样做:
1. creating a map
2. take a copy of zip column as zip1.
3. replacing values of zip with state
4. swap all and delete zip1
但是在寻找更好的方法。
将值加载到数据中(作为数据帧)
print(map1)生成:{95838:'CA',95823:'CA',95815:'CA',95834:'CA',95828:'CA'}
data['zip1'] = data['zip']
data = data.replace({"zip": map1})
print (data.head(10))
data['state'] = data['zip']
data['zip'] = data['zip1']
data = data.drop(['zip1'],axis=1)
print (data.head(10))
创建映射后,只需使用
pd.Series.map()
,它将以字典作为参数
map1 = data.set_index('zip')['state'].dropna().to_dict()
data['state'] = data['zip'].map(map1)
或者,如果您从df获取关于邮政编码-州配对的所有信息,您也可以使用一行
data['state'] = data.sort_values('state').groupby('zip')['state'].fillna(method='ffill')
data['state'] = data.sort_values('state').groupby('zip')['state'].fillna(method='ffill')