Pandas 如何调试/解决由数据错误引起的内存错误？_Pandas_Scikit Learn

Pandas 如何调试/解决由数据错误引起的内存错误？

pandas scikit-learn

Pandas 如何调试/解决由数据错误引起的内存错误？,pandas,scikit-learn,Pandas,Scikit Learn,我有一个熊猫数据框，比如说数据在32位、2 GB RAM的笔记本电脑上，我正在执行以下操作： >>>data.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 1000209 entries, 0 to 1000208 Data columns (total 5 columns): UserID 1000209 non-null int32 MovieID 1000209

我有一个熊猫数据框，比如说数据

在32位、2 GB RAM的笔记本电脑上，我正在执行以下操作：

>>>data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000209 entries, 0 to 1000208
Data columns (total 5 columns):
UserID        1000209 non-null int32
MovieID       1000209 non-null int32
Ratings       1000209 non-null int32
Age           1000209 non-null int32
Occupation    1000209 non-null int32
dtypes: int32(5)
memory usage: 58.7 MB

但它抛出了以下错误

MemoryError: could not allocate 50331648 bytes

我觉得这和我正在使用的笔记本电脑的规格有关，但我仍然不明白为什么会发生这种情况。有什么方法可以解决这个问题吗？

最好的方法是分析脚本的内存使用情况。为此,

安装内存分析器：

pip安装--用户内存分析器

将所有代码放入一个函数中，逐行分析它。如下所示：

from memory_profiler import profile

@profile
def main_model_training()
    # put all the code in here

然后按如下方式启动分析：

python -m memory_profiler script_name.py

下面是一个例子：

给定以下脚本：

from memory_profiler import profile
import pandas as pd
import numpy as np

@profile
def something_to_profile():
    df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
    df.count()

something_to_profile()

按如下方式运行配置文件：

python -m memory_profiler memory_profiling_test.py

提供以下逐行内存配置文件：

Line #    Mem usage    Increment   Line Contents
================================================
     5     64.3 MiB     64.3 MiB   @profile
     6                             def something_to_profile():
     7     64.3 MiB      0.0 MiB       df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
     8     64.3 MiB      0.0 MiB       df.count()

为了确定，2GB内存中有多少是免费的？您可以运行一些内存密集型进程，从而限制Python可用的RAM量script@black_fmWindows任务管理器显示大约250-300内存是可用的。是的，这绝对不够。我相信python正在尝试分配50MB的连续RAM，在这种情况下，您可能不会有这样的RAM。尝试关闭尽可能多的应用程序，然后再试一次。更好的方法是重新启动你的笔记本电脑，在运行脚本之前不要打开任何东西。除了答案之外，一定要确保使用64位python

Line #    Mem usage    Increment   Line Contents
================================================
     5     64.3 MiB     64.3 MiB   @profile
     6                             def something_to_profile():
     7     64.3 MiB      0.0 MiB       df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))
     8     64.3 MiB      0.0 MiB       df.count()