Python 如何将重复列表元素解析为数据帧
例如,一个名为“main”的列表Python 如何将重复列表元素解析为数据帧,python,pandas,Python,Pandas,例如,一个名为“main”的列表 main = [('date', '2020-04-21'), ('oldname', 'Tap'), ('newname', 'Tapnew'), ('icon_url', '3'), ('date', '2020-04-21'), ('oldname', 'Nod'), ('newname', 'Nodnew'), ('icon_url','4'), ('date', '2020-04-21'), ('oldname', 'Mik')
main = [('date', '2020-04-21'), ('oldname', 'Tap'), ('newname', 'Tapnew'), ('icon_url', '3'), ('date', '2020-04-21'), ('oldname', 'Nod'), ('newname', 'Nodnew'), ('icon_url','4'), ('date', '2020-04-21'), ('oldname', 'Mik'), ('newname', 'Miknew'), ('icon_url','5')]
我试着用这个直接解析和转换
df = pd.DataFrame(main)
test = df.T
test.columns = test.iloc[0]
a = test.drop(test.index[0])
但是,结果数据帧仍然是具有重复列的长稀疏形式
date oldname newname icon_url date oldname newname icon_url date oldname newname icon_url
2020-04-21 Tap Tapnew 3 2020-04-21 Nod Nodnew 4 2020-04-21 Mik Miknew 5
期望的输出是
date oldname newname icon_url
2020-04-21 Tap Tapnew 3
2020-04-21 Nod Nodnew 4
2020-04-21 Mik Miknew 5
我挣扎了一整天,有人能解释一下吗?提前感谢。请按照您的方式阅读数据框。然后,通过检查单词的位置
“date”
并获取总和,为数据组创建索引。在这一点上,我们只是pivot
df = pd.DataFrame(main)
df['index'] = df[0].eq('date').cumsum()
df = df.pivot(index='index', columns=0, values=1).rename_axis(None, axis=1)
date icon_url newname oldname
index
1 2020-04-21 3 Tapnew Tap
2 2020-04-21 4 Nodnew Nod
3 2020-04-21 5 Miknew Mik
从df=pd.DataFrame(main)
它只是两列上的枢轴(关于枢轴的更多信息):
输出:
col date icon_url newname oldname
idx
0 2020-04-21 3 Tapnew Tap
1 2020-04-21 4 Nodnew Nod
2 2020-04-21 5 Miknew Mik
将元组列表转换为字典
In [62]: def tuple_to_dict(some_list):
...: result = {}
...: for k, v in some_list:
...: result.setdefault(k, []).append(v)
...:
...: return result
...:
In [63]: tuple_to_dict(main)
Out[63]:
{'date': ['2020-04-21', '2020-04-21', '2020-04-21'],
'oldname': ['Tap', 'Nod', 'Mik'],
'newname': ['Tapnew', 'Nodnew', 'Miknew'],
'icon_url': ['3', '4', '5']}
In [64]: df = pd.DataFrame(tuple_to_dict(main))
In [65]: df
Out[65]:
date oldname newname icon_url
0 2020-04-21 Tap Tapnew 3
1 2020-04-21 Nod Nodnew 4
2 2020-04-21 Mik Miknew 5
此解决方案使用Python,对于大数据应该更有效。它利用了这样一个事实,即Python列表是有序的,您将看到在4组中解析元组(解决方案中的变量n)
您可以使用获取字典,然后读入数据帧:
from collections import defaultdict
d = defaultdict(list)
for k,v in main:
d[k].append(v)
pd.DataFrame(d)
date oldname newname icon_url
0 2020-04-21 Tap Tapnew 3
1 2020-04-21 Nod Nodnew 4
2 2020-04-21 Mik Miknew 5
s=a.melt()
s['i']=s.groupby(0).cumcount()
s=s.pivot(index='i',columns=0,values='value')
0 date icon_url newname oldname
i
0 2020-04-21 3 Tapnew Tap
1 2020-04-21 4 Nodnew Nod
2 2020-04-21 5 Miknew Mik
main = [('date', '2020-04-21'), ('oldname', 'Tap'), ('newname', 'Tapnew'), ('icon_url', '3'), ('date', '2020-04-21'), ('oldname', 'Nod'), ('newname', 'Nodnew'), ('icon_url','4'), ('date', '2020-04-21'), ('oldname', 'Mik'), ('newname', 'Miknew'), ('icon_url','5')]
n = 4
# Extract column names
main_columns = [item[0] for item in main[:n]]
# Extract values
main_values = [item[1] for item in main]
# Reshape values to return list of lists
main_reshaped = [main_values[(i-1)*n:(i*n)] for i in range(1, len(main_values)//4+1)]
# Call DataFrame constructor
pd.DataFrame(main_reshaped, columns = main_columns)
date oldname newname icon_url
0 2020-04-21 Tap Tapnew 3
1 2020-04-21 Nod Nodnew 4
2 2020-04-21 Mik Miknew 5
from collections import defaultdict
d = defaultdict(list)
for k,v in main:
d[k].append(v)
pd.DataFrame(d)
date oldname newname icon_url
0 2020-04-21 Tap Tapnew 3
1 2020-04-21 Nod Nodnew 4
2 2020-04-21 Mik Miknew 5