在Python中连接数据帧时出现内存错误

在Python中连接数据帧时出现内存错误,python,csv,docker,Python,Csv,Docker,我有一个680MB的大csv文件,我必须在一个数据帧中读取它 我将文件拆分成块,然后将这些块添加到列表中 然后,我尝试使用pd.concat()创建一个整合的数据帧 我使用以下代码来实现这一点: temp_list = [] chunksize = 10 ** 5 for chunk in pd.read_csv('./data/properties_2016.csv', chunksize=chunksize, low_memory=False): temp_list.append(ch

我有一个680MB的大csv文件,我必须在一个数据帧中读取它

我将文件拆分成块,然后将这些块添加到列表中

然后,我尝试使用pd.concat()创建一个整合的数据帧

我使用以下代码来实现这一点:

temp_list = []
chunksize = 10 ** 5

for chunk in pd.read_csv('./data/properties_2016.csv', chunksize=chunksize, low_memory=False):
  temp_list.append(chunk)

properties_df = temp_list[0]

for df in temp_list[1:]:
   properties_df = pd.concat([properties_df, df], ignore_index=True)
我试图通过运行docker映像来实现这一点

我得到以下内存错误:

Traceback (most recent call last):
File "dataIngestion.py", line 53, in <module>
properties_df = pd.concat([properties_df, df], ignore_index=True)
File "/usr/local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 206, in concat
copy=copy)
File "/usr/local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 266, in __init__
obj._consolidate(inplace=True)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3156, in _consolidate
self._consolidate_inplace()
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3138, in _consolidate_inplace
self._protect_consolidate(f)
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3127, in _protect_consolidate
result = f()
File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 3136, in f
self._data = self._data.consolidate()
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3573, in consolidate
bm._consolidate_inplace()
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 3578, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 4525, in _consolidate
_can_consolidate=_can_consolidate)
File "/usr/local/lib/python3.6/site-packages/pandas/core/internals.py", line 4548, in _merge_blocks
new_values = new_values[argsort]
MemoryError
回溯(最近一次呼叫最后一次):
文件“dataIngestion.py”,第53行,在
properties\u df=pd.concat([properties\u df,df],ignore\u index=True)
concat中的文件“/usr/local/lib/python3.6/site packages/pandas/core/reforme/concat.py”,第206行
复制=复制)
文件“/usr/local/lib/python3.6/site packages/pandas/core/reforme/concat.py”,第266行,在__
对象合并(就地=真实)
文件“/usr/local/lib/python3.6/site packages/pandas/core/generic.py”,第3156行,在
self.\u巩固\u就位()
文件“/usr/local/lib/python3.6/site packages/pandas/core/generic.py”,第3138行,放在原地
自我保护巩固(f)
文件“/usr/local/lib/python3.6/site packages/pandas/core/generic.py”,第3127行,在“保护”和“合并”中
结果=f()
文件“/usr/local/lib/python3.6/site packages/pandas/core/generic.py”,第3136行,f
self.\u data=self.\u data.consolidate()
文件“/usr/local/lib/python3.6/site packages/pandas/core/internals.py”,第3573行,合并
bm._巩固_到位()
文件“/usr/local/lib/python3.6/site-packages/pandas/core/internals.py”,第3578行,就地合并
self.blocks=元组(_合并(self.blocks))
文件“/usr/local/lib/python3.6/site packages/pandas/core/internals.py”,第4525行,在
_可以合并=\u可以合并)
文件“/usr/local/lib/python3.6/site packages/pandas/core/internals.py”,第4548行,在合并块中
新值=新值[argsort]
记忆者

请帮帮我

连接数据帧不能以这种方式工作。我想这会有帮助的

这是正确的方法

temp_list = []
chunksize = 10 ** 5

for chunk in pd.read_csv('./data/properties_2016.csv', chunksize=chunksize, low_memory=False):
    temp_list.append(chunk)

frames = []
for df in temp_list:
    frames.append(df)
properties_df = pd.concat(frames, ignore_index=True)

我在一个小文件上尝试了它并成功了,如果您仍然有相同的错误,请告诉我。

我尝试了您建议的方法。现在,我没有得到上面的错误,而是被杀死了。我尝试增加我的docker内存,它成功运行了!!你有没有试着不分块地阅读整个csv?使用pandas的read_csv读取680 MB听上去并没有那么大。是的,我试过了,但这个过程被
扼杀了