Python 熊猫系列替换值
我有一个熊猫系列,其值如下:Python 熊猫系列替换值,python,pandas,dataframe,series,Python,Pandas,Dataframe,Series,我有一个熊猫系列,其值如下: Bachelors Degree 639 Diploma 291 O - Level 264 Masters Degree 149 Certificate 126 A - Level 69 PGD 40 Bachelors Degree 28 A-L
Bachelors Degree 639
Diploma 291
O - Level 264
Masters Degree 149
Certificate 126
A - Level 69
PGD 40
Bachelors Degree 28
A-Level 20
O-Level 15
Masters 10
Bachelors 6
diploma 5
certificate 5
Ph.D 4
A- Level 2
Post Graduate Diploma 1
Msc Environment 1
BBA 1
O- Level 1
Masters 1
PhD 1
我从excel中获取数据
我想用pandas做数据清理,比如用硕士学位替换所有硕士学位的案例(我可以用excel做,但我正在学习pandas)
我试过了
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":"diploma",
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":"certificate",
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].map(mapp)
仅返回只有一个值的证书密钥的结果
似乎我不能使用列表作为字典键的值
任何关于如何替换这些值的建议都将受到高度赞赏。
罗纳德
这是实际数据在excel列中的显示方式。
我已经添加了一个列中数据的图像。
挑战在于如何取代各种各样的“硕士学位”。一个想法是将一个元素值转换为一个元素列表,如
“文凭”
转换为[“文凭”]
:
如果不可能,则使用:
d = {}
for k, v in mapp.items():
if isinstance(v, list):
for x in v:
d[x.lower()] = k
else:
d[v.lower()] = k
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
EDUCATION_LEVEL VAL
0 Bachelor's Degree 639
1 Ordinary Diploma 291
2 Ordinary Level 264
3 Master's Degree 149
4 Certificate 126
5 Advanced Level 69
6 Post Graduate Diploma 40
7 Bachelor's Degree 28
8 Advanced Level 20
9 Ordinary Level 15
10 Master's Degree 10
11 Bachelor's Degree 6
12 Ordinary Diploma 5
13 Certificate 5
14 PHD 4
15 NaN 2
16 Post Graduate Diploma 1
17 Master's Degree 1
18 Bachelor's Degree 1
19 Ordinary Level 1
20 Master's Degree 1
21 PHD 1
首先,通过将所有值设置为列表,对mapp dict进行轻微更改:
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":["diploma"],
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":["certificate"],
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))
0 Bachelor's Degree
1 Ordinary Diploma
2 Ordinary Level
3 Master's Degree
4 Certificate
5 Advanced Level
6 Post Graduate Diploma
7 Bachelor's Degree
8 Advanced Level
9 Ordinary Level
10 Master's Degree
11 Bachelor's Degree
12 Ordinary Diploma
13 Certificate
14 PHD
15 A- Level
16 Post Graduate Diploma
17 Master's Degree
18 Bachelor's Degree
19 Ordinary Level
20 Master's Degree
21 PHD
我用这个方法得到了Nan值。我用这个方法得到了Nan值
d = {}
for k, v in mapp.items():
if isinstance(v, list):
for x in v:
d[x.lower()] = k
else:
d[v.lower()] = k
df['EDUCATION_LEVEL']=df['EDUCATION_LEVEL'].str.lower().map(d)
print (df)
EDUCATION_LEVEL VAL
0 Bachelor's Degree 639
1 Ordinary Diploma 291
2 Ordinary Level 264
3 Master's Degree 149
4 Certificate 126
5 Advanced Level 69
6 Post Graduate Diploma 40
7 Bachelor's Degree 28
8 Advanced Level 20
9 Ordinary Level 15
10 Master's Degree 10
11 Bachelor's Degree 6
12 Ordinary Diploma 5
13 Certificate 5
14 PHD 4
15 NaN 2
16 Post Graduate Diploma 1
17 Master's Degree 1
18 Bachelor's Degree 1
19 Ordinary Level 1
20 Master's Degree 1
21 PHD 1
mapp={"Bachelor's Degree":["Bachelors Degree","Bachelors","BBA","Bachelors Degree"],
"Ordinary Diploma":["diploma"],
"Ordinary Level":["O - Level","O-Level","O- Level"],
"Master's Degree":["Masters Degree","Masters","Msc Environment","Masters"],
"Certificate":["certificate"],
"Advanced Level":["A - Level","A-Level","- Level"],
"Post Graduate Diploma":["Post Graduate Diploma","PGD"],
"PHD":["Ph.D","PhD"]
}
mapp_new = [{l:k for l in v} for k,v in mapp.items()]
mapp_new = {k.lower(): v for d in mapp_new for k, v in d.items()}
df.EDUCATION_LEVEL.apply(lambda x: mapp_new.get(x.lower(), x))
0 Bachelor's Degree
1 Ordinary Diploma
2 Ordinary Level
3 Master's Degree
4 Certificate
5 Advanced Level
6 Post Graduate Diploma
7 Bachelor's Degree
8 Advanced Level
9 Ordinary Level
10 Master's Degree
11 Bachelor's Degree
12 Ordinary Diploma
13 Certificate
14 PHD
15 A- Level
16 Post Graduate Diploma
17 Master's Degree
18 Bachelor's Degree
19 Ordinary Level
20 Master's Degree
21 PHD