在python数据帧中用整数字典替换字符串
我有以下数据框在python数据帧中用整数字典替换字符串,python,pandas,Python,Pandas,我有以下数据框 epi_week state loc_type disease cases incidence 21835 200011 WY STATE MUMPS 2 0.40 21836 197501 WY STATE POLIO 3 0.76 21837 199607 WY STATE HE
epi_week state loc_type disease cases incidence
21835 200011 WY STATE MUMPS 2 0.40
21836 197501 WY STATE POLIO 3 0.76
21837 199607 WY STATE HEPATITIS 3 0.61
21838 197116 WY STATE MUMPS 6 1.73
21839 200048 WY STATE HEPATITIS 6 1.21
我试图用一个唯一的整数替换每个疾病。例如,“腮腺炎”==1
,“脊髓灰质炎”==2
等。最后的数据帧应如下所示:
epi_week state loc_type disease cases incidence
21835 200011 WY STATE 1 2 0.40
21836 197501 WY STATE 2 3 0.76
21837 199607 WY STATE 3 3 0.61
21838 197116 WY STATE 1 6 1.73
21839 200048 WY STATE 3 6 1.21
我正在使用以下代码:
# creating a dictionary
disease_dic = {'MUMPS':1, 'POLIO':2, 'MEASLES':3, 'RUBELLA':4,
'PERTUSSIS':5, 'HEPATITIS A':6, 'SMALLPOX':7,
'DIPHTHERIA':8}
data.disease = [disease_dic[item] for item in data.disease]
我遇到以下错误:
KeyErrorTraceback (most recent call last)
<ipython-input-115-52394901c90d> in <module>()
----> 1 cdc.disease = [disease_dic[item2] for item2 in cdc.disease]
KeyError: 1
KeyErrorTraceback(最近一次呼叫上次)
在()
---->1疾病预防控制中心=疾病预防控制中心疾病预防控制中心疾病预防控制中心疾病预防控制中心[项目2]中的项目2]
关键错误:1
有人能帮我解决这个问题吗?谢谢。使用应用
Ex:
disease_dic = {'MUMPS':1, 'POLIO':2, 'MEASLES':3, 'RUBELLA':4,
'PERTUSSIS':5, 'HEPATITIS A':6, 'SMALLPOX':7,
'DIPHTHERIA':8}
import pandas as pd
df = pd.DataFrame({"disease": disease_dic.keys()})
print(df["disease"].apply(lambda x: disease_dic.get(x)))
0 4
1 2
2 1
3 8
4 3
5 5
6 7
7 6
Name: disease, dtype: int64
输出:
disease_dic = {'MUMPS':1, 'POLIO':2, 'MEASLES':3, 'RUBELLA':4,
'PERTUSSIS':5, 'HEPATITIS A':6, 'SMALLPOX':7,
'DIPHTHERIA':8}
import pandas as pd
df = pd.DataFrame({"disease": disease_dic.keys()})
print(df["disease"].apply(lambda x: disease_dic.get(x)))
0 4
1 2
2 1
3 8
4 3
5 5
6 7
7 6
Name: disease, dtype: int64
使用data.disease=data.disease.map(disease\u dic)
item2
表示的键在某个时候似乎不存在于dict中。我在这里写作时,它是一个打字错误。正确的行是data.disease=[disease\u dic[item2]用于data.disease中的item2]
@jezral使用data.disease=data.disease.map(disease\u dic)
给出NaN