Python Pandas-基于引用dict重复数据帧列_Python_Pandas_Dataframe

Python Pandas-基于引用dict重复数据帧列

python pandas dataframe

Python Pandas-基于引用dict重复数据帧列,python,pandas,dataframe,Python,Pandas,Dataframe,我需要根据引用dict重命名并重复我的dataframe列。下面我创建了一个虚拟dataframe： rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absen

我需要根据引用dict重命名并重复我的dataframe列。下面我创建了一个虚拟dataframe：

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')

        entity  entity2  entity3
id                              
json   present  present   absent
molly   absent  present   absent
tina    absent  present   absent
jake   present   absent  present
molly  present   absent   absent

       entity_exp1  entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                      
json    present      present      present      absent      absent    absent
molly   absent       present      present      absent      absent    absent
tina    absent       present      present      absent      absent    absent
jake    present      absent       absent       present     present   present
molly   present      absent       absent       absent      absent    absent

现在我有下面的例子：

ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

现在，我需要根据dict值替换列名，如果一个列有多个值，则应该重复该列。以下是我想要的数据帧：

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')

        entity  entity2  entity3
id                              
json   present  present   absent
molly   absent  present   absent
tina    absent  present   absent
jake   present   absent  present
molly  present   absent   absent

       entity_exp1  entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                      
json    present      present      present      absent      absent    absent
molly   absent       present      present      absent      absent    absent
tina    absent       present      present      absent      absent    absent
jake    present      absent       absent       present     present   present
molly   present      absent       absent       absent      absent    absent

对于

df

中的每一列，您可以在

ref_dict

中查找新列的编号，并为其创建

新列

，最后删除旧列。您可以尝试以下操作：

# for key, value in ref_dict where old column and new columns are 
for old_column,new_columns in ref_dict.items():
    for new_column in new_columns:  # for each new_column in new_columns defined
        df[new_column] = df[old_column] # the content remains same as old column
    del df[old_column]  # now remove the old column

您可以简单地循环：

rawdata= {'id':['json','molly','tina','jake','molly'],
          'entity':['present','absent','absent','present','present'],
          'entity2':['present','present','present','absent','absent'],
          'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
ref_dict= {'entity':['entity_exp1'],
           'entity2':['entity2_exp1','entity2_exp2'],
           'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

# here comes the new part:
df2 = pd.DataFrame()
for key, val in sorted(ref_dict.items()):
    for subval in val:
        df2[subval] = df[key]

df2['id'] = df['id']
df2.set_index('id', inplace=True)

print(df2)
      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2  entity3_exp3  
id                                                                      
json      present      present      present       absent       absent        absent   
molly      absent      present      present       absent       absent        absent   
tina       absent      present      present       absent       absent        absent   
jake      present       absent       absent      present      present       present    
molly     present       absent       absent       absent       absent        absent

您可以使用dict键作为列名重新索引df，然后使用dict的值重命名列

df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[]))
df_new.columns=sum(ref_dict.values(),[])
df_new
Out[573]: 
  entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
0     present      present      present       absent       absent       absent
1      absent      present      present       absent       absent       absent
2      absent      present      present       absent       absent       absent
3     present       absent       absent      present      present      present
4     present       absent       absent       absent       absent       absent

选项1
在词典理解中使用

pd.concat

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)

      entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id                                                                                
json       present      present       absent       absent       absent     present
molly      present      present       absent       absent       absent      absent
tina       present      present       absent       absent       absent      absent
jake        absent       absent      present      present      present     present
molly       absent       absent       absent       absent       absent     present

选项2
切片数据帧并重命名列

repeats = df.columns.map(lambda x: len(ref_dict[x]))
d1 = df.reindex_axis(df.columns.repeat(repeats), 1)
d1.columns = df.columns.map(ref_dict.get).values.sum()
d1

      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                                                                                
json      present      present      present       absent       absent       absent
molly      absent      present      present       absent       absent       absent
tina       absent      present      present       absent       absent       absent
jake      present       absent       absent      present      present      present
molly     present       absent       absent       absent       absent       absent

谢谢你没有回答我的问题。请随意投票，谢谢。你总是有最惊人的解决方案。