Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/rust/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Panda_将df1中的多个值合并为df2中的唯一值_Python_Pandas - Fatal编程技术网

Python Panda_将df1中的多个值合并为df2中的唯一值

Python Panda_将df1中的多个值合并为df2中的唯一值,python,pandas,Python,Pandas,我有一个数据框(df1)和另一个映射数据框(df2),其中包含基因的详细信息和它们相关的器官列表,另一个映射数据框(df2)将这些器官映射为独特的器官类型 例如 下面的df1是一个如何使用pandas构建逻辑的示例 设置 import pandas as pd df1 = pd.DataFrame({"Gene_name": ("Gene1", "Gene2", "Gene3", "Gene4"), "Organ_name": ("Skin, St

我有一个数据框(df1)和另一个映射数据框(df2),其中包含基因的详细信息和它们相关的器官列表,另一个映射数据框(df2)将这些器官映射为独特的器官类型

例如


下面的df1是一个如何使用
pandas
构建逻辑的示例

设置

import pandas as pd

df1 = pd.DataFrame({"Gene_name": ("Gene1", "Gene2", "Gene3", "Gene4"),   
                    "Organ_name": ("Skin, Stomach, Eyes, Hair", "Lungs, Mouth, Oesophagus",
                                  "Pharynx, Lungs, Throat, Skin", "Stomach, Small intestine")})

df2 = pd.DataFrame({"Type": ("External", "External", "External", "External", "Internal", "Internal", "Internal"),
                    "Organ": ("Skin", "Eyes", "Hair", "Legs", "Lungs", "Small intestine", "Oesophagus")})
解决方案

t = df2.set_index('Organ')['Type']

df1['Organ_list'] = df1['Organ_name'].str.split(', ')

df1['Int_Ext'] = [list(filter(None, map(t.get, x))) for x in df1['Organ_list']]

df1['Int_Ext_Flag'] = df1['Int_Ext'].apply(lambda x: 'Internal' if \
                      x.count('Internal') / len(x) >= 0.5 else 'External')
结果

  Gene_name                    Organ_name                      Organ_list  \
0     Gene1     Skin, Stomach, Eyes, Hair     [Skin, Stomach, Eyes, Hair]   
1     Gene2      Lungs, Mouth, Oesophagus      [Lungs, Mouth, Oesophagus]   
2     Gene3  Pharynx, Lungs, Throat, Skin  [Pharynx, Lungs, Throat, Skin]   
3     Gene4      Stomach, Small intestine      [Stomach, Small intestine]   

                          Int_Ext Int_Ext_Flag  
0  [External, External, External]     External  
1            [Internal, Internal]     Internal  
2            [Internal, External]     Internal  
3                      [Internal]     Internal 
解释

t = df2.set_index('Organ')['Type']

df1['Organ_list'] = df1['Organ_name'].str.split(', ')

df1['Int_Ext'] = [list(filter(None, map(t.get, x))) for x in df1['Organ_list']]

df1['Int_Ext_Flag'] = df1['Int_Ext'].apply(lambda x: 'Internal' if \
                      x.count('Internal') / len(x) >= 0.5 else 'External')
  • 使用
    df2
    创建从器官到类型的映射
  • df1['organg_list']
    中的字符串拆分成一个列表
  • 将此列表的元素映射到类型。通过pd.Series.apply添加逻辑以确定是“内部”还是“外部”
  • 在本例中,我通过
    列表(filter(None,…)
    筛选出尚未映射到类型的器官

我想您会发现这个集合很有用。仅供参考,您制作数据帧的方式可能适用于
R
(如果您解决了引号问题),但不适用于
pandas
。如果您正在寻找
pandas
解决方案,那么如果您发布了可以在python中使用的数据帧,这将非常有用。非常感谢您,这正是我想要的。
  Gene_name                    Organ_name                      Organ_list  \
0     Gene1     Skin, Stomach, Eyes, Hair     [Skin, Stomach, Eyes, Hair]   
1     Gene2      Lungs, Mouth, Oesophagus      [Lungs, Mouth, Oesophagus]   
2     Gene3  Pharynx, Lungs, Throat, Skin  [Pharynx, Lungs, Throat, Skin]   
3     Gene4      Stomach, Small intestine      [Stomach, Small intestine]   

                          Int_Ext Int_Ext_Flag  
0  [External, External, External]     External  
1            [Internal, Internal]     Internal  
2            [Internal, External]     Internal  
3                      [Internal]     Internal