Python 如何从基于dataframe中列的名称构建的字典中创建新列_Python_Pandas

Python 如何从基于dataframe中列的名称构建的字典中创建新列

python pandas

Python 如何从基于dataframe中列的名称构建的字典中创建新列,python,pandas,Python,Pandas,我有一个df，看起来像这个df id Canada USA France UK Egypt Sudan age_type 1 True False False True False False adult 2 False True True True False True pediatric 3 False False False False True False pediatric 这本字典呢 code = {"adu

我有一个df，看起来像这个df

id  Canada   USA  France  UK   Egypt Sudan age_type
 1   True   False False  True  False False  adult
 2   False  True  True   True  False True   pediatric
 3   False  False False  False True  False  pediatric

这本字典呢

code = {"adult":{"America":{"Canada","USA"},
                 "Europe":{"France,"UK"},
                 "Africa":{"Egypt","Sudan"}},
        "pediatric":{"America":{"Canada","USA"},
                     "Europe":{"France,"UK"},
                     "Africa":{"Egypt","Sudan"}}}

我想创建一个新的列，其中包含基于此词典的

“美国”、“欧洲”、“非洲”

的值。我尝试了

df.map（）

，但它不能正常工作。您是否有其他解决方案或新方法来代替使用

code

字典？因此，最终输出将是

id  Canada   USA  France  UK    Egypt  Sudan  age_type   continent
 1   True   True   False  False  False False  adult      America
 2   False  False  True   True   False False  pediatric  Europe
 3   False  False  False  False  True  True   pediatric  Africa

使用apply的简单解决方案

import pandas as pd
import numpy as np

d = {
    'Canada': [True, False, False],
    'USA': [True, False, False],
    'France': [False, True, False],
    'UK': [False, True, False],
    'Egypt': [False, False, True],
    'Sudan': [False, False, True],
    }

def mapToContinent(x):
    if x[0] or x [1]: return 'America'
    if x[2] or x [3]: return 'Erope'
    if x[4] or x [5]: return 'Africa'

df = pd.DataFrame(d)
df['continent'] = df.apply(mapToContinent, axis=1)
print(df.head)

印刷品：

   Canada    USA  France     UK  Egypt  Sudan continent
0    True   True   False  False  False  False   America
1   False  False    True   True  False  False     Erope
2   False  False   False  False   True   True    Africa

使用apply的简单解决方案

import pandas as pd
import numpy as np

d = {
    'Canada': [True, False, False],
    'USA': [True, False, False],
    'France': [False, True, False],
    'UK': [False, True, False],
    'Egypt': [False, False, True],
    'Sudan': [False, False, True],
    }

def mapToContinent(x):
    if x[0] or x [1]: return 'America'
    if x[2] or x [3]: return 'Erope'
    if x[4] or x [5]: return 'Africa'

df = pd.DataFrame(d)
df['continent'] = df.apply(mapToContinent, axis=1)
print(df.head)

印刷品：

   Canada    USA  France     UK  Egypt  Sudan continent
0    True   True   False  False  False  False   America
1   False  False    True   True  False  False     Erope
2   False  False   False  False   True   True    Africa

您可以尝试这种灵活的解决方案：

import pandas as pd
import io

#creation of dataframe
s_e='''
id  Canada   USA  France  UK    Egypt  Sudan  age_type
 1   True   True   False  False  False False  adult      
 2   False  False  True   True   False False  pediatric  
 3   False  False  False  False  True  True   pediatric  
'''
s_e=s_e.replace(' ','       ')
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', engine='python')
print(df)


dct={"America":{"Canada","USA"},
                     "Europe":{"France","UK"},
                     "Africa":{"Egypt","Sudan"}}

#Approach to solution
delimiter = ", "
tmp= df[df.columns[1:len(df.columns)-2]].rename(columns=lambda x: x+delimiter)
df['Continent'] = tmp.dot(tmp.columns).str[:-len(delimiter)].apply(lambda x: [k for k,v in dct.items() if len(v&set(x.split(', ')))>=1][0])
print(df)

输出：

df
   id  Canada    USA  France     UK  Egypt  Sudan   age_type
0   1    True   True   False  False  False  False      adult
1   2   False  False    True   True  False  False  pediatric
2   3   False  False   False  False   True   True  pediatric

newdf
   id  Canada    USA  France     UK  Egypt  Sudan   age_type Continent
0   1    True   True   False  False  False  False      adult   America
1   2   False  False    True   True  False  False  pediatric    Europe
2   3   False  False   False  False   True   True  pediatric    Africa

您可以尝试这种灵活的解决方案：

import pandas as pd
import io

#creation of dataframe
s_e='''
id  Canada   USA  France  UK    Egypt  Sudan  age_type
 1   True   True   False  False  False False  adult      
 2   False  False  True   True   False False  pediatric  
 3   False  False  False  False  True  True   pediatric  
'''
s_e=s_e.replace(' ','       ')
df = pd.read_csv(io.StringIO(s_e), sep='\s\s+', engine='python')
print(df)


dct={"America":{"Canada","USA"},
                     "Europe":{"France","UK"},
                     "Africa":{"Egypt","Sudan"}}

#Approach to solution
delimiter = ", "
tmp= df[df.columns[1:len(df.columns)-2]].rename(columns=lambda x: x+delimiter)
df['Continent'] = tmp.dot(tmp.columns).str[:-len(delimiter)].apply(lambda x: [k for k,v in dct.items() if len(v&set(x.split(', ')))>=1][0])
print(df)

输出：

df
   id  Canada    USA  France     UK  Egypt  Sudan   age_type
0   1    True   True   False  False  False  False      adult
1   2   False  False    True   True  False  False  pediatric
2   3   False  False   False  False   True   True  pediatric

newdf
   id  Canada    USA  France     UK  Egypt  Sudan   age_type Continent
0   1    True   True   False  False  False  False      adult   America
1   2   False  False    True   True  False  False  pediatric    Europe
2   3   False  False   False  False   True   True  pediatric    Africa

如果您想要灵活的解决方案，可以使用

def map_code(row):
    age_dict = code[row['age_type']]
    countries = [col for col in row.keys() if row[col] and col not in ['id', 'age_type']]
    for k, v in age_dict.items():
        if v == set(countries):
            return k


df['continent'] = df.apply(lambda x: map_code(x), axis=1)

如果您想要灵活的解决方案，可以使用

def map_code(row):
    age_dict = code[row['age_type']]
    countries = [col for col in row.keys() if row[col] and col not in ['id', 'age_type']]
    for k, v in age_dict.items():
        if v == set(countries):
            return k


df['continent'] = df.apply(lambda x: map_code(x), axis=1)

你想加入什么？你能提供你想要的吗output@kait原稿经过编辑以获得更多澄清。谢谢你想参加什么活动？你能提供你想要的吗output@kait原稿经过编辑以获得更多澄清。感谢@Jacek，但这不适用于大数据帧。我应该让解决方案更灵活谢谢@Jacek，但这不适用于大数据帧。我应该让解决方案更灵活