Python 使用字典替换列值
我有一个数据框架,其中性别预期为男性或女性Python 使用字典替换列值,python,pandas,dictionary,dataframe,replace,Python,Pandas,Dictionary,Dataframe,Replace,我有一个数据框架,其中性别预期为男性或女性 from io import StringIO import pandas as pd audit_trail = StringIO(''' course_id AcademicYear_to months TotalFee Gender 260 2017 24 100 male 260 2018 12 140 male 274 2016 36 300 mail 274 2017 24 340 female 274 2018 12 200 anima
from io import StringIO
import pandas as pd
audit_trail = StringIO('''
course_id AcademicYear_to months TotalFee Gender
260 2017 24 100 male
260 2018 12 140 male
274 2016 36 300 mail
274 2017 24 340 female
274 2018 12 200 animal
285 2017 24 300 bird
285 2018 12 200 maela
''')
df11 = pd.read_csv(audit_trail, sep=" " )
我可以用字典纠正拼写错误
corrections={'mail':'male', 'mael':'male', 'maae':'male'}
df11.Gender.replace(corrections)
但我正在寻找一种方法,在剩下的选项中只保留男性/女性和“其他”类别。预期产出:
0 male
1 male
2 male
3 female
4 other
5 other
6 male
Name: Gender, dtype: object
您可以使用:
corrections={'mail':'male', 'maela':'male', 'maae':'male', 'male':'male', 'female':'female'}
df11[['Gender']] = df11[['Gender']].applymap(corrections.get).fillna('other')
print (df11)
course_id AcademicYear_to months TotalFee Gender
0 260 2017 24 100 male
1 260 2018 12 140 male
2 274 2016 36 300 male
3 274 2017 24 340 female
4 274 2018 12 200 other
5 285 2017 24 300 other
6 285 2018 12 200 male
编辑:
对于替换,只有一列更好ᴏʟᴅsᴘᴇᴇᴅ'这是我的回答。如果要替换多个列,最好是
applymap
将另外两个虚拟条目添加到更正
记录中:
corrections = {'male' : 'male', # dummy entry for male
'female' : 'female', # dummy entry for female
'mail' : 'male',
'maela' : 'male',
'maae' : 'male'}
现在,使用map
和fillna
:
df11.Gender = df11.Gender.map(corrections).fillna('other')
df11
course_id AcademicYear_to months TotalFee Gender
0 260 2017 24 100 male
1 260 2018 12 140 male
2 274 2016 36 300 male
3 274 2017 24 340 female
4 274 2018 12 200 other
5 285 2017 24 300 other
6 285 2018 12 200 male
是的,有时我把它复杂化了