Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何保持大熊猫的记忆效率?_Python_Python 3.x_Pandas_Memory Management_Garbage Collection - Fatal编程技术网

Python 如何保持大熊猫的记忆效率?

Python 如何保持大熊猫的记忆效率?,python,python-3.x,pandas,memory-management,garbage-collection,Python,Python 3.x,Pandas,Memory Management,Garbage Collection,我有一个数据集,它有一个文本数据列,大约有600k行 所以我尝试只将文本数据保存为H5格式,以便将来更快地加载,我尝试使用垃圾收集器 这是我的密码 import pandas as pd import numpy as np import gc df = pd.read_csv('Reviews.csv') text = df['Text'] df = None gc.collect() text.to_hdf('text.h5','data',format='table') text

我有一个数据集,它有一个文本数据列,大约有600k行

所以我尝试只将文本数据保存为H5格式,以便将来更快地加载,我尝试使用垃圾收集器

这是我的密码

import pandas as pd
import numpy as np
import gc

df = pd.read_csv('Reviews.csv')

text = df['Text']

df = None
gc.collect()

text.to_hdf('text.h5','data',format='table')
text = None
gc.collect()


print("Done")
但不幸的是,这会产生内存错误,即使我有16gb的内存,我如何在不放弃内存的情况下做到这一点

  • 分块阅读大的
    csv
    文件(根据经验调整
    chunksize
  • 使用
    append=True
    模式将块(一组行)追加到指定的HDFStore中

for chunk in pd.read_csv('Reviews.csv', chunksize=10**5):
    chunk['Text'].to_hdf('text.h5', 'data', format='table', append=True)