Python 将numpy.arrays增量附加到保存文件_Python_Arrays_Numpy

Python 将numpy.arrays增量附加到保存文件

python arrays numpy

Python 将numpy.arrays增量附加到保存文件,python,arrays,numpy,Python,Arrays,Numpy,我尝试过Hpaulji概述的这种方法，但似乎不起作用：基本上，我正在遍历一个生成器，对数组进行一些更改，然后尝试保存每个迭代的数组下面是我的示例代码： filename = 'testing.npy' with open(filename, 'wb') as f: for x, _ in train_generator: prediction = base_model.predict(x) print(prediction[0,0,0,0:5])

我尝试过Hpaulji概述的这种方法，但似乎不起作用：

基本上，我正在遍历一个生成器，对数组进行一些更改，然后尝试保存每个迭代的数组

下面是我的示例代码：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

在这里，我将经历5次迭代，所以我希望保存5个不同的数组

出于调试目的，我打印了每个阵列的一部分：

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

但当我尝试加载数组时，多次如这里所述，，我得到了一个EOFERROR：

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

它只输出最后一个数组，然后输出一个EOFERROR：

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

我希望保存所有5个数组，但是当我多次加载save.npy文件时，我只得到最后一个数组

那么，我应该如何保存新数组并将其附加到文件中呢

编辑：使用“.npz”测试只保存最后一个数组

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']

所有对

np.save

的调用都使用文件名，而不是文件句柄。由于不重用filehandle，每次保存都会覆盖文件，而不是将数组附加到文件中

这应该起作用：

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(f, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

虽然将多个数组存储在一个

.npy

文件中可能有好处（我认为在内存有限的情况下会有好处），但它们只存储一个数组，您可以使用

.npz

文件（

np.savez

或

np.savez\u compressed

）来存储多个数组：

filename = 'testing.npz'
predictions = []
for (x, _), index in zip(train_generator, range(5)):
    prediction = base_model.predict(x)
    predictions.append(prediction)
np.savez(filename, predictions) # will name it arr_0
# np.savez(filename, predictions=predictions) # would name it predictions
# np.savez(filename, *predictions) # would name it arr_0, arr_1, …, arr_4

顺便说一句，我不知道你的约会有多大，但是你试过HDF5吗，或者你是被绑定到

.npy

存储吗？我没有试过HDF5。我似乎这是更好的选择（我的数据约为100000张图像），但我必须对文档进行更多的挖掘，因为我对HDF5不太熟悉。好的，不幸的是，我无法帮助回答您的问题，但请查阅h5py文档，语法很容易掌握，可以开始存储/附加数字数据，如果使用正确，速度会很快。@jp_data_analysis谢谢，我想我可能会切换到HDF5，因为它的使用范围更广。啊！非常感谢。我会在有机会的时候测试它。我只是尝试了

.npz

测试.npz

和

np.savez（f，prediction）

，但它似乎只保存了最后一个数组。我加载数组的方式与OP中的代码相同，但我只看到一个键--['arr_0']。我会更新OP以防出错。我已经为npz文件添加了一个示例。为此，您只需调用一次

savez

（作为一个数组列表或多个数组）。谢谢您列出了多种方法。@yself，这是一个很好的答案，因为文档中没有提到保存np数组列表。我试着调用

np.savez（filename，next_array）

，就像在文件中附加一样，但显然它不是这样工作的。