Python 根据另一列填写空单元格
我想基于另一列匹配/映射数据帧中缺少的值。比如说,Python 根据另一列填写空单元格,python,pandas,dictionary,Python,Pandas,Dictionary,我想基于另一列匹配/映射数据帧中缺少的值。比如说, City State Country Chicago IL United States Boston MA United States San Diego Los Angeles CA United States Sa
City State Country
Chicago IL United States
Boston MA United States
San Diego
Los Angeles CA United States
San Francisco
Sacramento
Vancouver BC Canada
所以,如果我想填补这三个城市的省份和国家的空白单元格,就像洛杉矶一样。我该怎么办
下面是我的代码,但我完全陷入其中
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df.loc[df['City'] == CA_cities, 'State' = 'CA' and 'Country' = 'United States']
任何帮助都将不胜感激。您可以使用
groupby
和由isin
创建的掩码,然后通过前后填充替换NaN
s:
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Boston MA United States
2 San Diego CA United States
3 Los Angeles CA United States
4 San Francisco CA United States
5 Sacramento CA United States
6 Vancouver BC Canada
更通用的解决方案是创建城市组,例如在字典中,交换键
与值和地图
列:
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 NaN NaN
2 Boston MA United States
3 San Diego NaN NaN
4 Los Angeles CA United States
5 San Francisco NaN NaN
6 Sacramento NaN NaN
7 Vancouver BC Canada
cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'],
'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}
df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 IL United States
2 Boston MA United States
3 San Diego CA United States
4 Los Angeles CA United States
5 San Francisco CA United States
6 Sacramento CA United States
7 Vancouver BC Canada
详细信息:
print (df['City'].map(d).fillna(df['City']))
0 IL
1 IL
2 Boston
3 CA
4 CA
5 CA
6 CA
7 Vancouver
Name: City, dtype: object
print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA',
'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}
您可以将
groupby
与由isin
创建的掩码一起使用,然后将NaN
s替换为前后填充:
CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Boston MA United States
2 San Diego CA United States
3 Los Angeles CA United States
4 San Francisco CA United States
5 Sacramento CA United States
6 Vancouver BC Canada
更通用的解决方案是创建城市组,例如在字典中,交换键
与值和地图
列:
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 NaN NaN
2 Boston MA United States
3 San Diego NaN NaN
4 Los Angeles CA United States
5 San Francisco NaN NaN
6 Sacramento NaN NaN
7 Vancouver BC Canada
cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'],
'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}
df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
City State Country
0 Chicago IL United States
1 Chicago1 IL United States
2 Boston MA United States
3 San Diego CA United States
4 Los Angeles CA United States
5 San Francisco CA United States
6 Sacramento CA United States
7 Vancouver BC Canada
详细信息:
print (df['City'].map(d).fillna(df['City']))
0 IL
1 IL
2 Boston
3 CA
4 CA
5 CA
6 CA
7 Vancouver
Name: City, dtype: object
print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA',
'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}
或者只需将其拆分,然后使用
fillna
CA_cities = ['SanDiego', 'LosAngeles', 'SanFrancisco', 'Sacramento']
s=df.loc[df.City.isin(CA_cities),:]
t=df.loc[~df.City.isin(CA_cities),:]
pd.concat([s.fillna({'State':'CA','Country':'UnitedStates'}),t])
Out[1023]:
City State Country
2 SanDiego CA UnitedStates
3 LosAngeles CA UnitedStates
4 SanFrancisco CA UnitedStates
5 Sacramento CA UnitedStates
0 Chicago IL UnitedStates
1 Boston MA UnitedStates
6 Vancouver BC Canada
或者只需将其拆分,然后使用
fillna
CA_cities = ['SanDiego', 'LosAngeles', 'SanFrancisco', 'Sacramento']
s=df.loc[df.City.isin(CA_cities),:]
t=df.loc[~df.City.isin(CA_cities),:]
pd.concat([s.fillna({'State':'CA','Country':'UnitedStates'}),t])
Out[1023]:
City State Country
2 SanDiego CA UnitedStates
3 LosAngeles CA UnitedStates
4 SanFrancisco CA UnitedStates
5 Sacramento CA UnitedStates
0 Chicago IL UnitedStates
1 Boston MA UnitedStates
6 Vancouver BC Canada
感谢您提供了多种方法。这是很好的解释。谢谢!感谢您提供了多种方法。这是很好的解释。谢谢!谢谢你的解决方案。谢谢!谢谢你的解决方案。谢谢!