Python h5py(HDF5)-大数据阵的随机错误-IOError:Can';我不准备写数据

Python h5py(HDF5)-大数据阵的随机错误-IOError:Can';我不准备写数据,python,numpy,hdf5,h5py,hdf,Python,Numpy,Hdf5,H5py,Hdf,在尝试创建相当大的numpy ndarray数据集时遇到了一个非常奇怪的问题 e、 g 块大小为(256256) 如果将上述ndarray设置为(512512),则一切正常 如果上述数据数组设置为(100000000000100000000000),则一切正常 理想情况下,我想要一个大小为(305569398333055693983)的ndarray,该ndarray不能满足以下条件: (3055693983, 3055693983) Traceback (most recent call l

在尝试创建相当大的numpy ndarray数据集时遇到了一个非常奇怪的问题

e、 g

块大小为(256256)

如果将上述ndarray设置为(512512),则一切正常

如果上述数据数组设置为(100000000000100000000000),则一切正常

理想情况下,我想要一个大小为(305569398333055693983)的ndarray,该ndarray不能满足以下条件:

(3055693983, 3055693983) Traceback (most recent call last): File "h5.py", line 16, in a[0:256,0:256]=np.ones((256,256)) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "/home/user/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 618, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "h5py/h5d.pyx", line 221, in h5py.h5d.DatasetID.write (/home/ilan/minonda/conda-bld/work/h5py/h5d.c:3527) File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1889) File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1599) IOError: Can't prepare for writing data (Can't retrieve number of elements in file dataset) (3055693983, 3055693983) 回溯(最近一次呼叫最后一次): 文件“h5.py”,第16行,在 a[0:256,0:256]=np.one((256256)) 文件“h5py/_objects.pyx”,第54行,h5py._objects.with_phil.wrapper(/home/ilan/minonda/conda bld/work/h5py/_objects.c:2696) 文件“h5py/_objects.pyx”,第55行,h5py._objects.with_phil.wrapper(/home/ilan/minonda/conda bld/work/h5py/_objects.c:2654) 文件“/home/user/anaconda2/lib/python2.7/site packages/h5py/_hl/dataset.py”,第618行,在u setitem中__ self.id.write(mspace、fspace、val、mtype、dxpl=self.\u dxpl) 文件“h5py/_objects.pyx”,第54行,h5py._objects.with_phil.wrapper(/home/ilan/minonda/conda bld/work/h5py/_objects.c:2696) 文件“h5py/_objects.pyx”,第55行,h5py._objects.with_phil.wrapper(/home/ilan/minonda/conda bld/work/h5py/_objects.c:2654) 文件“h5py/h5d.pyx”,第221行,在h5py.h5d.DatasetID.write(/home/ilan/minonda/conda bld/work/h5py/h5d.c:3527)中 文件“h5py/_proxy.pyx”,第132行,在h5py._proxy.dset_rw(/home/ilan/minonda/conda bld/work/h5py/_proxy.c:1889)中 文件“h5py/_proxy.pyx”,第93行,在h5py._proxy.h5py_H5Dwrite中(/home/ilan/minonda/conda bld/work/h5py/_proxy.c:1599) IOError:无法准备写入数据(无法检索文件数据集中的元素数) 将ndarray设置为几个随机大小会产生不同的结果。有些工作,有些不。。。我认为这可能是一些简单的事情,比如ndarray大小不能被chunk_大小平均整除,但这似乎不是问题所在


我错过了什么

您是否实际计算了以字节为单位的数组大小?还是EB?我不确定这是问题所在。我从不在内存中保存整个np.array,我只在片中触摸数组,然后逐块加载数据。可以通过在任何时候改变要处理的块的数量来调整内存使用情况。我现在正在使用大小为2^32(4294967296)的np.int64,它工作正常。一定是内在的东西。。。这不是一个缓冲区溢出问题,我可以输入然后重新提取我所有的数据。我问你是否做了实际的计算,你显然没有,我可以帮你做。创建N=1e10的NxN矩阵。这意味着一个数组中有1e20个数字,对吗?如果每个数字为四个字节,则为4e20字节=400 EB。如果我理解正确,当前的HDF5不支持超过4 EB的文件,因此我打赌这是缓冲区溢出,一些数组只是意外地获得了合理的维数。您是否实际计算了以字节为单位的数组大小?还是EB?我不确定这是问题所在。我从不在内存中保存整个np.array,我只在片中触摸数组,然后逐块加载数据。可以通过在任何时候改变要处理的块的数量来调整内存使用情况。我现在正在使用大小为2^32(4294967296)的np.int64,它工作正常。一定是内在的东西。。。这不是一个缓冲区溢出问题,我可以输入然后重新提取我所有的数据。我问你是否做了实际的计算,你显然没有,我可以帮你做。创建N=1e10的NxN矩阵。这意味着一个数组中有1e20个数字,对吗?如果每个数字为四个字节,则为4e20字节=400 EB。如果我理解正确的话,当前的HDF5不支持超过4EB的文件,所以我打赌这是缓冲区溢出,一些数组只是意外地获得了合理的尺寸。 (3055693983, 3055693983) Traceback (most recent call last): File "h5.py", line 16, in a[0:256,0:256]=np.ones((256,256)) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "/home/user/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 618, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "h5py/h5d.pyx", line 221, in h5py.h5d.DatasetID.write (/home/ilan/minonda/conda-bld/work/h5py/h5d.c:3527) File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1889) File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1599) IOError: Can't prepare for writing data (Can't retrieve number of elements in file dataset)