Python 3.x 如何在python中分离不一致的列？_Python 3.x_Pandas_Split

Python 3.x 如何在python中分离不一致的列？

python-3.x pandas

Python 3.x 如何在python中分离不一致的列？,python-3.x,pandas,split,Python 3.x,Pandas,Split,这里我附加了一个虚拟数据，其中，列是不一致的，这意味着，有时一些字段不存在，如果是“无”，如何分隔和填充列，只要假设所有这些列都是特征虚拟数据帧： feature1 : abc feature2 : bfj feature4 : feature5 : re feature1 : werq feature3 : kgh feature4 : hjyj feature5 : re feature1 : hg feature2 : jhgyj feature3 : ytyi

这里我附加了一个虚拟数据，其中，列是不一致的，这意味着，有时一些字段不存在，如果是“无”，如何分隔和填充列，只要假设所有这些列都是特征

虚拟数据帧：

feature1 : abc

feature2 : bfj

feature4 : 

feature5 : re


feature1 : werq

feature3 : kgh

feature4 : hjyj

feature5 : re


feature1 : hg

feature2 : jhgyj

feature3 : ytyitli

feature4 : guyhk

feature5 : yyjhj


feature2 : tyty

feature3 : ytrtf

feature4 : ewhgf

feature5 : ihyty

这就是我所期待的

 feature1     feature2       feature3         feature4       feature5 

  abc           bfj           None              None            re

  werq          None          kgh               hjyj            re

  hg            jhgyj         ytyitli           guyhk          yyjhj

  None          tyty          ytrtf             ewhgf          ihyty

感谢

想法是创建具有唯一值的字典，按照最后一列中的需要排序，这里通过使用排序后的唯一值进行排序，然后对于新列，通过获得差异，通过比较，每个组中始终有最后一个功能，

功能5

？您能告诉我们数据是如何可用的吗？就像是一个csv文件，你想从中创建一个数据帧？如何识别每个记录etc@jezrael：事实上，这是一个大数据，因此，有时“feature5”不存在，列也不存在inconsistent@mujjiga：这是一个虚拟数据，我有20多个要素列，实际数据如下所示，它是一个逗号分隔的“csv”文件

df['col1'] = df['col1'].str.strip(' :')

d = {v: k for k, v in  dict(enumerate(df['col1'].sort_values().unique())).items()}
print (d)
{'feature1': 0, 'feature2': 1, 'feature3': 2, 'feature4': 3, 'feature5': 4}

df['g'] = df['col1'].map(d).diff().lt(-1).cumsum()

df1 = df.pivot('g', 'col1', 'col2')
print (df1)
col1 feature1 feature2 feature3 feature4 feature5
g                                                
0         abc      bfj      NaN     None       re
1        werq      NaN      kgh     hjyj       re
2          hg    jhgyj  ytyitli    guyhk    yyjhj
3         NaN     tyty    ytrtf    ewhgf    ihyty