Python Panda_将df1中的多个值合并为df2中的唯一值
我有一个数据框(df1)和另一个映射数据框(df2),其中包含基因的详细信息和它们相关的器官列表,另一个映射数据框(df2)将这些器官映射为独特的器官类型 例如Python Panda_将df1中的多个值合并为df2中的唯一值,python,pandas,Python,Pandas,我有一个数据框(df1)和另一个映射数据框(df2),其中包含基因的详细信息和它们相关的器官列表,另一个映射数据框(df2)将这些器官映射为独特的器官类型 例如 下面的df1是一个如何使用pandas构建逻辑的示例 设置 import pandas as pd df1 = pd.DataFrame({"Gene_name": ("Gene1", "Gene2", "Gene3", "Gene4"), "Organ_name": ("Skin, St
下面的df1是一个如何使用
pandas
构建逻辑的示例
设置
import pandas as pd
df1 = pd.DataFrame({"Gene_name": ("Gene1", "Gene2", "Gene3", "Gene4"),
"Organ_name": ("Skin, Stomach, Eyes, Hair", "Lungs, Mouth, Oesophagus",
"Pharynx, Lungs, Throat, Skin", "Stomach, Small intestine")})
df2 = pd.DataFrame({"Type": ("External", "External", "External", "External", "Internal", "Internal", "Internal"),
"Organ": ("Skin", "Eyes", "Hair", "Legs", "Lungs", "Small intestine", "Oesophagus")})
解决方案
t = df2.set_index('Organ')['Type']
df1['Organ_list'] = df1['Organ_name'].str.split(', ')
df1['Int_Ext'] = [list(filter(None, map(t.get, x))) for x in df1['Organ_list']]
df1['Int_Ext_Flag'] = df1['Int_Ext'].apply(lambda x: 'Internal' if \
x.count('Internal') / len(x) >= 0.5 else 'External')
结果
Gene_name Organ_name Organ_list \
0 Gene1 Skin, Stomach, Eyes, Hair [Skin, Stomach, Eyes, Hair]
1 Gene2 Lungs, Mouth, Oesophagus [Lungs, Mouth, Oesophagus]
2 Gene3 Pharynx, Lungs, Throat, Skin [Pharynx, Lungs, Throat, Skin]
3 Gene4 Stomach, Small intestine [Stomach, Small intestine]
Int_Ext Int_Ext_Flag
0 [External, External, External] External
1 [Internal, Internal] Internal
2 [Internal, External] Internal
3 [Internal] Internal
解释
t = df2.set_index('Organ')['Type']
df1['Organ_list'] = df1['Organ_name'].str.split(', ')
df1['Int_Ext'] = [list(filter(None, map(t.get, x))) for x in df1['Organ_list']]
df1['Int_Ext_Flag'] = df1['Int_Ext'].apply(lambda x: 'Internal' if \
x.count('Internal') / len(x) >= 0.5 else 'External')
- 使用
创建从器官到类型的映射df2
- 将
中的字符串拆分成一个列表df1['organg_list']
- 将此列表的元素映射到类型。通过pd.Series.apply添加逻辑以确定是“内部”还是“外部”
- 在本例中,我通过
筛选出尚未映射到类型的器官列表(filter(None,…)
R
(如果您解决了引号问题),但不适用于pandas
。如果您正在寻找pandas
解决方案,那么如果您发布了可以在python中使用的数据帧,这将非常有用。非常感谢您,这正是我想要的。
Gene_name Organ_name Organ_list \
0 Gene1 Skin, Stomach, Eyes, Hair [Skin, Stomach, Eyes, Hair]
1 Gene2 Lungs, Mouth, Oesophagus [Lungs, Mouth, Oesophagus]
2 Gene3 Pharynx, Lungs, Throat, Skin [Pharynx, Lungs, Throat, Skin]
3 Gene4 Stomach, Small intestine [Stomach, Small intestine]
Int_Ext Int_Ext_Flag
0 [External, External, External] External
1 [Internal, Internal] Internal
2 [Internal, External] Internal
3 [Internal] Internal