Python 在无映射的情况下替换数据帧中的多个值的优雅方法?
我有一个如下所示的数据帧Python 在无映射的情况下替换数据帧中的多个值的优雅方法?,python,python-3.x,pandas,dataframe,str-replace,Python,Python 3.x,Pandas,Dataframe,Str Replace,我有一个如下所示的数据帧 import pandas as pd df1 = pd.DataFrame({'ethnicity': ['AMERICAN INDIAN/ALASKA NATIVE', 'WHITE - BRAZILIAN', 'WHITE-RUSSIAN','HISPANIC/LATINO - COLOMBIAN', 'HISPANIC/LATINO - MEXICAN','ASIAN','ASIAN - INDI
import pandas as pd
df1 = pd.DataFrame({'ethnicity': ['AMERICAN INDIAN/ALASKA NATIVE', 'WHITE - BRAZILIAN', 'WHITE-RUSSIAN','HISPANIC/LATINO - COLOMBIAN',
'HISPANIC/LATINO - MEXICAN','ASIAN','ASIAN - INDIAN','ASIAN - KOREAN','PORTUGUESE','MIDDLE-EASTERN','UNKNOWN',
'USER DECLINED','OTHERS']})
我想替换“种族”列的值。例如:如果值是ASIAN-INDIAN
,我只想将其替换为ASIAN
类似地,我想替换包含美语
、白色
、西班牙裔
的字符串,其他字符串替换为其他
。这就是我想要的
df1.loc[df.ethnicity.str.contains('WHITE'),'ethnicity'] = "WHITE"
df1.loc[df.ethnicity.str.contains('ASIAN'),'ethnicity'] = "ASIAN"
df1.loc[df.ethnicity.str.contains('HISPANIC'),'ethnicity'] = "HISPANIC"
df1.loc[df.ethnicity.str.contains('AMERICAN'),'ethnicity'] = "AMERICAN"
df1.loc[df.ethnicity.str.contains(other ethnicities),ethnicity] = "Others" # please note here I don't know how to replace all other ethnicities at once as others
我希望我的输出如下所示
import pandas as pd
df1 = pd.DataFrame({'ethnicity': ['AMERICAN INDIAN/ALASKA NATIVE', 'WHITE - BRAZILIAN', 'WHITE-RUSSIAN','HISPANIC/LATINO - COLOMBIAN',
'HISPANIC/LATINO - MEXICAN','ASIAN','ASIAN - INDIAN','ASIAN - KOREAN','PORTUGUESE','MIDDLE-EASTERN','UNKNOWN',
'USER DECLINED','OTHERS']})
列表值使用和为匹配返回NaN
s,因此添加:
或者你也可以加入我们的行列:
df1.ethnicity = (df1.ethnicity.str.extract('(WHITE|ASIAN|AMERICAN|HISPANIC)', expand=False)
.fillna('Others'))
哇!只有一行。向上投票。str-extract的工作原理是否类似于“str.extract”(“WHITE”|“ASIAN”|“AMERICAN”|“HISPANIC”)?@SSMK-是的,您很接近
df1.ocidentity=df1.ocidentity.str.extract(“(WHITE | ASIAN | AMERICAN | HISPANIC)”,expand=False)。fillna('Others')
,因此L
用于提取字符串(亚裔-印第安人
)从dataframe中重新替换为L
(`ASIAN)@SSMK-否,它仅用于创建(白人|亚裔|西班牙裔|美国人)
从列表中的值L
@SSMK-解决方案相同,首先从列表中创建(白人|亚裔|西班牙裔|美国人)
,然后传递到提取,第二次传递(白人|亚洲人|西班牙裔|美国人)
仅提取。.map()
有什么问题吗?您可以始终使用np。选择
来链接您的条件。我可能并不总是知道我的实际数据中可能包含哪些其他种族值,这些数据有超过百万行
print (df1)
ethnicity
0 AMERICAN
1 WHITE
2 WHITE
3 HISPANIC
4 HISPANIC
5 ASIAN
6 ASIAN
7 ASIAN
8 Others
9 Others
10 Others
11 Others
12 Others