Python 有没有更快的方法将大量字典转换为数据帧？_Python_Python 3.x_Pandas_List_Dictionary

Python 有没有更快的方法将大量字典转换为数据帧？

python python-3.x pandas list dictionary

Python 有没有更快的方法将大量字典转换为数据帧？,python,python-3.x,pandas,list,dictionary,Python,Python 3.x,Pandas,List,Dictionary,我有一个字典列表，名为list\u of_dict，格式如下： [{'id': 123, 'date': '202001', 'variable_x': 3}, {'id': 345, 'date': '202101', 'variable_x': 4}, ... ] 要将其转换为数据帧，我只需执行以下操作： df = pd.DataFrame(list_of_dict) 它是有效的，但是当一个用户尝试使用2000万本字典列表时，运行大约需要一个小时 Python有更快的方法来实现这一点吗

我有一个字典列表，名为list\u of_dict，格式如下：

[{'id': 123, 'date': '202001', 'variable_x': 3},
 {'id': 345, 'date': '202101', 'variable_x': 4}, ... ]

要将其转换为数据帧，我只需执行以下操作：

df = pd.DataFrame(list_of_dict)

它是有效的，但是当一个用户尝试使用2000万本字典列表时，运行大约需要一个小时

Python有更快的方法来实现这一点吗？

构建数据框架的最快方法有很多种，其中一种是字典列表。下面的计时显示了这一点

从根本上讲，将2000万行读入内存将意味着大量使用虚拟内存和交换。我期望的主要优化来自切分，而不需要内存中的所有数据

d = [{'id': 123, 'date': '202001', 'variable_x': 3},
 {'id': 345, 'date': '202101', 'variable_x': 4}]

c = d[0].keys()
r = 2*10**5
a = np.tile([list(l.values()) for l in d], (r,1))
d = np.tile(d, r)

%timeit pd.DataFrame(d)
%timeit pd.DataFrame(a, columns=c)
%timeit pd.DataFrame(a)
print(f"2D array size: {len(a):,}\ndict array size: {len(d):,}")

输出

53.4 µs ± 238 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
90.6 ms ± 400 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
90.4 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2D array size: 400,000
dict array size: 400,000