Python Pandas-基于引用dict重复数据帧列
我需要根据引用dict重命名并重复我的dataframe列。下面我创建了一个虚拟dataframe:Python Pandas-基于引用dict重复数据帧列,python,pandas,dataframe,Python,Pandas,Dataframe,我需要根据引用dict重命名并重复我的dataframe列。下面我创建了一个虚拟dataframe: rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absen
rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
entity entity2 entity3
id
json present present absent
molly absent present absent
tina absent present absent
jake present absent present
molly present absent absent
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
现在我有下面的例子:
ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}
现在,我需要根据dict值替换列名,如果一个列有多个值,则应该重复该列。以下是我想要的数据帧:
rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
entity entity2 entity3
id
json present present absent
molly absent present absent
tina absent present absent
jake present absent present
molly present absent absent
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
对于
df
中的每一列,您可以在ref_dict
中查找新列的编号,并为其创建新列
,最后删除旧列。您可以尝试以下操作:
# for key, value in ref_dict where old column and new columns are
for old_column,new_columns in ref_dict.items():
for new_column in new_columns: # for each new_column in new_columns defined
df[new_column] = df[old_column] # the content remains same as old column
del df[old_column] # now remove the old column
您可以简单地循环:
rawdata= {'id':['json','molly','tina','jake','molly'],
'entity':['present','absent','absent','present','present'],
'entity2':['present','present','present','absent','absent'],
'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
ref_dict= {'entity':['entity_exp1'],
'entity2':['entity2_exp1','entity2_exp2'],
'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}
# here comes the new part:
df2 = pd.DataFrame()
for key, val in sorted(ref_dict.items()):
for subval in val:
df2[subval] = df[key]
df2['id'] = df['id']
df2.set_index('id', inplace=True)
print(df2)
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
您可以使用dict键作为列名重新索引df,然后使用dict的值重命名列
df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[]))
df_new.columns=sum(ref_dict.values(),[])
df_new
Out[573]:
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
0 present present present absent absent absent
1 absent present present absent absent absent
2 absent present present absent absent absent
3 present absent absent present present present
4 present absent absent absent absent absent
选项1
在词典理解中使用
pd.concat
pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)
entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id
json present present absent absent absent present
molly present present absent absent absent absent
tina present present absent absent absent absent
jake absent absent present present present present
molly absent absent absent absent absent present
选项2切片数据帧并重命名列
repeats = df.columns.map(lambda x: len(ref_dict[x]))
d1 = df.reindex_axis(df.columns.repeat(repeats), 1)
d1.columns = df.columns.map(ref_dict.get).values.sum()
d1
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
谢谢你没有回答我的问题。请随意投票,谢谢。你总是有最惊人的解决方案。