Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python pd.iterrows()消耗所有内存并给出一个错误(进程结束,退出代码137(被信号9:SIGKILL中断))_Python_Pandas_Numpy - Fatal编程技术网

Python pd.iterrows()消耗所有内存并给出一个错误(进程结束,退出代码137(被信号9:SIGKILL中断))

Python pd.iterrows()消耗所有内存并给出一个错误(进程结束,退出代码137(被信号9:SIGKILL中断)),python,pandas,numpy,Python,Pandas,Numpy,我有一个csv文件,有超过750000行和2列(序列号,状态) 序列号是从0到750000的序列号,状态为0或1 我正在pandas中读取csv文件,然后读取与SN同名的.npy文件,然后将.npy文件附加到名为(x_train,x_val)的两个列表中 x_val应为2000元素,其中700应为state=1,其余为state=0。 剩下的就交给x_火车吧 问题是,在读取约190000行后,进程停止,RAM被消耗(PC RAM=32 GB) 我的代码是: nodules_path = &quo

我有一个csv文件,有超过750000行和2列(序列号,状态)
序列号是从0到750000的序列号,状态为0或1
我正在pandas中读取csv文件,然后读取与SN同名的.npy文件,然后将.npy文件附加到名为(x_train,x_val)的两个列表中
x_val应为2000元素,其中700应为state=1,其余为state=0。 剩下的就交给x_火车吧
问题是,在读取约190000行后,进程停止,RAM被消耗(PC RAM=32 GB)

我的代码是:

nodules_path = "~/cropped_nodules/"
nodules_csv = pandas.read_csv("~/cropped_nodules_2.csv")

positive = 0
negative = 0
x_val = []
x_train = []
y_train = []
y_val = []

for nodule in nodules_csv.iterrows():

    if nodule.state == 1 and positive <= 700 and len(x_val) <= 2000 :
        positive += 1
        x_val_img = str(nodule.SN) + ".npy"
        x_val.append(np.load(os.path.join(nodules_path,x_val_img)))
        y_val.append(nodule.state)

    elif nodule.state == 0 and negative <= 1300 and len(x_val) <= 2000:
        x_val_img = str(nodule.SN) + ".npy"
        negative += 1
        x_val.append(np.load(os.path.join(nodules_path,x_val_img)))
        y_val.append(nodule.state)

    else:

        if len(x_train) % 10000 == 0:
            gc.collect()
            print("gc done")
        x_train_img = str(nodule.SN) + ".npy"
        x_train.append(np.load(os.path.join(nodules_path,x_train_img)))
        y_train.append(nodule.state)
        print("x_train len= ", len(x_train))
        print("Size of list1: " + str(sys.getsizeof(x_train)) + "bytes")
结果是:

[ Top 10 ]
/home/mustafa/.local/lib/python3.8/site-packages/numpy/lib/format.py:741: size=11.3 GiB, count=204005, average=58.2 KiB
/home/mustafa/.local/lib/python3.8/site-packages/numpy/lib/format.py:771: size=4781 KiB, count=102002, average=48 B
/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py:4855: size=2391 KiB, count=102000, average=24 B
/home/mustafa/home/mustafa/project/LUNAMASK/nodule_3D_CNN.py:84: size=806 KiB, count=2, average=403 KiB
/home/mustafa/home/mustafa/project/LUNAMASK/nodule_3D_CNN.py:85: size=805 KiB, count=1, average=805 KiB
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py:2056: size=78.0 KiB, count=2305, average=35 B
/usr/lib/python3.8/abc.py:102: size=42.5 KiB, count=498, average=87 B
/home/mustafa/.local/lib/python3.8/site-packages/numpy/core/_asarray.py:83: size=41.6 KiB, count=757, average=56 B
/usr/local/lib/python3.8/dist-packages/pandas/core/series.py:512: size=37.5 KiB, count=597, average=64 B
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:1880: size=16.5 KiB, count=5, average=3373 B

我建议你看看@XtianP谢谢你的建议,我试着实现它如下:使用pandas.read_csv(“/home/mustafa/project/LUNA16/cropped_oblosis_2.csv”,chunksize=chunksize)作为reader:for chunk in reader:for index,nomble in chunk.iterrows():但仍然存在相同的问题x_train len=190142列表大小1:1671784字节进程已完成,退出代码137(被信号9:SIGKILL中断)是否可以将
append
s替换为追加到文件?@accumulation没有问题,但我应该将其保存到什么类型的文件?因为我需要再次加载它并将其发送给CNN。我正在从npy文件中读取3D IMG,并将IMG附加到列表中以创建训练和测试数据。如果我理解正确,您的结节已经在单独的文件中。与其将它们全部加载以将它们添加到x_val中,为什么不只将对象的路径放入x_val中?然后,只有在需要的时候,才一个接一个地懒洋洋地加载对象?
with pandas.read_csv("~/cropped_nodules_2.csv", chunksize=chunksize) as reader:
    for chunk in reader:
        for index, nodule in chunk.iterrows():
            if nodule.state == 1 and positive <= 700 and len(x_val) <= 2000 :
........
import tracemalloc

tracemalloc.start()

# ... run your application ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)
[ Top 10 ]
/home/mustafa/.local/lib/python3.8/site-packages/numpy/lib/format.py:741: size=11.3 GiB, count=204005, average=58.2 KiB
/home/mustafa/.local/lib/python3.8/site-packages/numpy/lib/format.py:771: size=4781 KiB, count=102002, average=48 B
/usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py:4855: size=2391 KiB, count=102000, average=24 B
/home/mustafa/home/mustafa/project/LUNAMASK/nodule_3D_CNN.py:84: size=806 KiB, count=2, average=403 KiB
/home/mustafa/home/mustafa/project/LUNAMASK/nodule_3D_CNN.py:85: size=805 KiB, count=1, average=805 KiB
/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py:2056: size=78.0 KiB, count=2305, average=35 B
/usr/lib/python3.8/abc.py:102: size=42.5 KiB, count=498, average=87 B
/home/mustafa/.local/lib/python3.8/site-packages/numpy/core/_asarray.py:83: size=41.6 KiB, count=757, average=56 B
/usr/local/lib/python3.8/dist-packages/pandas/core/series.py:512: size=37.5 KiB, count=597, average=64 B
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py:1880: size=16.5 KiB, count=5, average=3373 B