Python 如何将重复列表元素解析为数据帧

Python 如何将重复列表元素解析为数据帧,python,pandas,Python,Pandas,例如,一个名为“main”的列表 main = [('date', '2020-04-21'), ('oldname', 'Tap'), ('newname', 'Tapnew'), ('icon_url', '3'), ('date', '2020-04-21'), ('oldname', 'Nod'), ('newname', 'Nodnew'), ('icon_url','4'), ('date', '2020-04-21'), ('oldname', 'Mik')

例如,一个名为“main”的列表

main =  [('date', '2020-04-21'),  ('oldname', 'Tap'),  ('newname', 'Tapnew'),  ('icon_url',   '3'),  ('date', '2020-04-21'),  ('oldname', 'Nod'),  ('newname', 'Nodnew'),  ('icon_url','4'),  ('date', '2020-04-21'),  ('oldname', 'Mik'),  ('newname', 'Miknew'),  ('icon_url','5')]
我试着用这个直接解析和转换

df = pd.DataFrame(main)
test = df.T
test.columns = test.iloc[0]
a = test.drop(test.index[0])
但是,结果数据帧仍然是具有重复列的长稀疏形式

 date      oldname     newname    icon_url     date      oldname     newname    icon_url    date      oldname     newname    icon_url 
2020-04-21    Tap      Tapnew        3       2020-04-21      Nod     Nodnew       4      2020-04-21       Mik     Miknew      5  
期望的输出是

 date      oldname     newname    icon_url     
2020-04-21    Tap     Tapnew        3     
2020-04-21    Nod     Nodnew        4      
2020-04-21    Mik     Miknew        5  

我挣扎了一整天,有人能解释一下吗?提前感谢。

请按照您的方式阅读数据框。然后,通过检查单词的位置
“date”
并获取总和,为数据组创建
索引。在这一点上,我们只是
pivot

df = pd.DataFrame(main)
df['index'] = df[0].eq('date').cumsum()
df = df.pivot(index='index', columns=0, values=1).rename_axis(None, axis=1)

             date icon_url newname oldname
index                                     
1      2020-04-21        3  Tapnew     Tap
2      2020-04-21        4  Nodnew     Nod
3      2020-04-21        5  Miknew     Mik

df=pd.DataFrame(main)
它只是两列上的枢轴(关于枢轴的更多信息):

输出:

col        date icon_url newname oldname
idx                                     
0    2020-04-21        3  Tapnew     Tap
1    2020-04-21        4  Nodnew     Nod
2    2020-04-21        5  Miknew     Mik

将元组列表转换为字典

In [62]: def tuple_to_dict(some_list):
    ...:     result = {}
    ...:     for k, v in some_list:
    ...:         result.setdefault(k, []).append(v)
    ...:
    ...:     return result
    ...:

In [63]: tuple_to_dict(main)
Out[63]:
{'date': ['2020-04-21', '2020-04-21', '2020-04-21'],
 'oldname': ['Tap', 'Nod', 'Mik'],
 'newname': ['Tapnew', 'Nodnew', 'Miknew'],
 'icon_url': ['3', '4', '5']}

In [64]: df = pd.DataFrame(tuple_to_dict(main))

In [65]: df
Out[65]:
         date oldname newname icon_url
0  2020-04-21     Tap  Tapnew        3
1  2020-04-21     Nod  Nodnew        4
2  2020-04-21     Mik  Miknew        5

此解决方案使用Python,对于大数据应该更有效。它利用了这样一个事实,即Python列表是有序的,您将看到在4组中解析元组(解决方案中的变量n)

您可以使用获取字典,然后读入数据帧:

from collections import defaultdict
d = defaultdict(list)
for k,v in main:
    d[k].append(v)

pd.DataFrame(d)

date    oldname newname icon_url
0   2020-04-21  Tap Tapnew  3
1   2020-04-21  Nod Nodnew  4
2   2020-04-21  Mik Miknew  5
s=a.melt()
s['i']=s.groupby(0).cumcount()
s=s.pivot(index='i',columns=0,values='value')
0        date icon_url newname oldname
i                                     
0  2020-04-21        3  Tapnew     Tap
1  2020-04-21        4  Nodnew     Nod
2  2020-04-21        5  Miknew     Mik
main =  [('date', '2020-04-21'),  ('oldname', 'Tap'),  ('newname', 'Tapnew'),  ('icon_url',   '3'),  ('date', '2020-04-21'),  ('oldname', 'Nod'),  ('newname', 'Nodnew'),  ('icon_url','4'),  ('date', '2020-04-21'),  ('oldname', 'Mik'),  ('newname', 'Miknew'),  ('icon_url','5')]

n = 4

# Extract column names
main_columns = [item[0] for item in main[:n]]
# Extract values
main_values = [item[1] for item in main]
# Reshape values to return list of lists
main_reshaped = [main_values[(i-1)*n:(i*n)] for i in range(1, len(main_values)//4+1)]

# Call DataFrame constructor
pd.DataFrame(main_reshaped, columns = main_columns)


    date        oldname newname icon_url
0   2020-04-21  Tap     Tapnew  3
1   2020-04-21  Nod     Nodnew  4
2   2020-04-21  Mik     Miknew  5
from collections import defaultdict
d = defaultdict(list)
for k,v in main:
    d[k].append(v)

pd.DataFrame(d)

date    oldname newname icon_url
0   2020-04-21  Tap Tapnew  3
1   2020-04-21  Nod Nodnew  4
2   2020-04-21  Mik Miknew  5