Python 读取大型.h5文件时出现内存错误

Python 读取大型.h5文件时出现内存错误,python,numpy,memory,hdf5,h5py,Python,Numpy,Memory,Hdf5,H5py,我已经从numpy阵列创建了一个.h5 h5f = h5py.File('/data/debo/jetAnomaly/AtlasData/dijets/mergedRoot/miniTrees/JZ3W.h5', 'w') h5f.create_dataset('JZ3WPpxpypz', data=all, compression="gzip") filename = '/data/debo/jetAnomaly/AtlasData/dijets/mergedRoot/miniTrees/J

我已经从numpy阵列创建了一个.h5

h5f = h5py.File('/data/debo/jetAnomaly/AtlasData/dijets/mergedRoot/miniTrees/JZ3W.h5', 'w')
h5f.create_dataset('JZ3WPpxpypz', data=all, compression="gzip")
filename = '/data/debo/jetAnomaly/AtlasData/dijets/mergedRoot/miniTrees/JZ3W.h5'
h5 = h5py.File(filename,'r')

h5.keys()
HDF5数据集“JZ3WPpxpypz”:形状(19494500376),键入“f8”

但是我在将.h5文件读取到numpy数组时遇到内存错误

h5f = h5py.File('/data/debo/jetAnomaly/AtlasData/dijets/mergedRoot/miniTrees/JZ3W.h5', 'w')
h5f.create_dataset('JZ3WPpxpypz', data=all, compression="gzip")
filename = '/data/debo/jetAnomaly/AtlasData/dijets/mergedRoot/miniTrees/JZ3W.h5'
h5 = h5py.File(filename,'r')

h5.keys()
[u'jz3wppxppypz']

data = h5['JZ3WPpxpypz']
如果我试图查看数组,它会给我内存错误

data[:]

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-33-629f56f97409> in <module>()
----> 1 data[:]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

/home/debo/env_autoencoder/local/lib/python2.7/site-packages/h5py/_hl/dataset.pyc in __getitem__(self, args)
    560         single_element = selection.mshape == ()
    561         mshape = (1,) if single_element else selection.mshape
--> 562         arr = numpy.ndarray(mshape, new_dtype, order='C')
    563 
    564         # HDF5 has a bug where if the memory shape has a different rank

MemoryError: 
数据[:]
---------------------------------------------------------------------------
MemoryError回溯(上次最近调用)
在()
---->1数据[:]
h5py.\u objects.with\u phil.wrapper()中的h5py/\u objects.pyx
h5py.\u objects.with\u phil.wrapper()中的h5py/\u objects.pyx
/home/debo/env_autoencoder/local/lib/python2.7/site-packages/h5py//\u hl/dataset.pyc in\uuuuu getitem\uuuuuu(self,args)
560单个_元素=selection.mshape==()
561 mshape=(1,)如果单个元素else selection.mshape
-->562 arr=numpy.ndarray(mshape,new_dtype,order='C')
563
564#HDF5有一个错误,如果内存形状具有不同的等级
记忆错误:
是否有任何内存有效的方法将.h5文件读入numpy数组

谢谢,
Debo.

您不需要调用
numpy.ndarray()
来获取数组。 试试这个:

arr = h5['JZ3WPpxpypz'][:]
# or
arr = data[:]
添加
[:]
返回数组(与
数据
变量不同——它只是引用HDF5数据集)。这两种方法都应该为您提供与原始数组相同的数据类型和形状的数组。您还可以使用numpy切片操作来获取数组的子集

现在需要澄清。我忽略了在打印
数据[:]的过程中调用
numpy.ndarray()
。 以下是显示两次调用返回值差异的类型检查:

# check type for each variable:
data = h5['JZ3WPpxpypz']
print (type(data))
# versus
arr = data[:]
print (type(arr))
输出将如下所示:

<class 'h5py._hl.dataset.Dataset'>
<class 'numpy.ndarray'>

他正在使用
数据[:]
。原始
是否仍在内存中?可能是内存不足,无法容纳两(三)个这样大小的数组。在加载过程中,
h5py
可能正在创建一个额外的缓冲区,但您对此没有任何控制权。