Python 如何从基于dataframe中列的名称构建的字典中创建新列
我有一个df,看起来像这个dfPython 如何从基于dataframe中列的名称构建的字典中创建新列,python,pandas,Python,Pandas,我有一个df,看起来像这个df id Canada USA France UK Egypt Sudan age_type 1 True False False True False False adult 2 False True True True False True pediatric 3 False False False False True False pediatric 这本字典呢 code = {"adu
id Canada USA France UK Egypt Sudan age_type
1 True False False True False False adult
2 False True True True False True pediatric
3 False False False False True False pediatric
这本字典呢
code = {"adult":{"America":{"Canada","USA"},
"Europe":{"France,"UK"},
"Africa":{"Egypt","Sudan"}},
"pediatric":{"America":{"Canada","USA"},
"Europe":{"France,"UK"},
"Africa":{"Egypt","Sudan"}}}
我想创建一个新的列,其中包含基于此词典的“美国”、“欧洲”、“非洲”
的值。我尝试了df.map()
,但它不能正常工作。
您是否有其他解决方案或新方法来代替使用code
字典?因此,最终输出将是
id Canada USA France UK Egypt Sudan age_type continent
1 True True False False False False adult America
2 False False True True False False pediatric Europe
3 False False False False True True pediatric Africa
使用apply的简单解决方案
import pandas as pd
import numpy as np
d = {
'Canada': [True, False, False],
'USA': [True, False, False],
'France': [False, True, False],
'UK': [False, True, False],
'Egypt': [False, False, True],
'Sudan': [False, False, True],
}
def mapToContinent(x):
if x[0] or x [1]: return 'America'
if x[2] or x [3]: return 'Erope'
if x[4] or x [5]: return 'Africa'
df = pd.DataFrame(d)
df['continent'] = df.apply(mapToContinent, axis=1)
print(df.head)
印刷品:
Canada USA France UK Egypt Sudan continent
0 True True False False False False America
1 False False True True False False Erope
2 False False False False True True Africa
使用apply的简单解决方案
import pandas as pd
import numpy as np
d = {
'Canada': [True, False, False],
'USA': [True, False, False],
'France': [False, True, False],
'UK': [False, True, False],
'Egypt': [False, False, True],
'Sudan': [False, False, True],
}
def mapToContinent(x):
if x[0] or x [1]: return 'America'
if x[2] or x [3]: return 'Erope'
if x[4] or x [5]: return 'Africa'
df = pd.DataFrame(d)
df['continent'] = df.apply(mapToContinent, axis=1)
print(df.head)
印刷品:
Canada USA France UK Egypt Sudan continent
0 True True False False False False America
1 False False True True False False Erope
2 False False False False True True Africa
您可以尝试这种灵活的解决方案:
import pandas as pd
import io
#creation of dataframe
s_e='''
id Canada USA France UK Egypt Sudan age_type
1 True True False False False False adult
2 False False True True False False pediatric
3 False False False False True True pediatric
'''
s_e=s_e.replace(' ',' ')
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', engine='python')
print(df)
dct={"America":{"Canada","USA"},
"Europe":{"France","UK"},
"Africa":{"Egypt","Sudan"}}
#Approach to solution
delimiter = ", "
tmp= df[df.columns[1:len(df.columns)-2]].rename(columns=lambda x: x+delimiter)
df['Continent'] = tmp.dot(tmp.columns).str[:-len(delimiter)].apply(lambda x: [k for k,v in dct.items() if len(v&set(x.split(', ')))>=1][0])
print(df)
输出:
df
id Canada USA France UK Egypt Sudan age_type
0 1 True True False False False False adult
1 2 False False True True False False pediatric
2 3 False False False False True True pediatric
newdf
id Canada USA France UK Egypt Sudan age_type Continent
0 1 True True False False False False adult America
1 2 False False True True False False pediatric Europe
2 3 False False False False True True pediatric Africa
您可以尝试这种灵活的解决方案:
import pandas as pd
import io
#creation of dataframe
s_e='''
id Canada USA France UK Egypt Sudan age_type
1 True True False False False False adult
2 False False True True False False pediatric
3 False False False False True True pediatric
'''
s_e=s_e.replace(' ',' ')
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', engine='python')
print(df)
dct={"America":{"Canada","USA"},
"Europe":{"France","UK"},
"Africa":{"Egypt","Sudan"}}
#Approach to solution
delimiter = ", "
tmp= df[df.columns[1:len(df.columns)-2]].rename(columns=lambda x: x+delimiter)
df['Continent'] = tmp.dot(tmp.columns).str[:-len(delimiter)].apply(lambda x: [k for k,v in dct.items() if len(v&set(x.split(', ')))>=1][0])
print(df)
输出:
df
id Canada USA France UK Egypt Sudan age_type
0 1 True True False False False False adult
1 2 False False True True False False pediatric
2 3 False False False False True True pediatric
newdf
id Canada USA France UK Egypt Sudan age_type Continent
0 1 True True False False False False adult America
1 2 False False True True False False pediatric Europe
2 3 False False False False True True pediatric Africa
如果您想要灵活的解决方案,可以使用
def map_code(row):
age_dict = code[row['age_type']]
countries = [col for col in row.keys() if row[col] and col not in ['id', 'age_type']]
for k, v in age_dict.items():
if v == set(countries):
return k
df['continent'] = df.apply(lambda x: map_code(x), axis=1)
如果您想要灵活的解决方案,可以使用
def map_code(row):
age_dict = code[row['age_type']]
countries = [col for col in row.keys() if row[col] and col not in ['id', 'age_type']]
for k, v in age_dict.items():
if v == set(countries):
return k
df['continent'] = df.apply(lambda x: map_code(x), axis=1)
你想加入什么?你能提供你想要的吗output@kait原稿经过编辑以获得更多澄清。谢谢你想参加什么活动?你能提供你想要的吗output@kait原稿经过编辑以获得更多澄清。感谢@Jacek,但这不适用于大数据帧。我应该让解决方案更灵活谢谢@Jacek,但这不适用于大数据帧。我应该让解决方案更灵活