Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python h5py:如何读取hdf5文件的选定行?_Python_Numpy_Dataset_H5py - Fatal编程技术网

Python h5py:如何读取hdf5文件的选定行?

Python h5py:如何读取hdf5文件的选定行?,python,numpy,dataset,h5py,Python,Numpy,Dataset,H5py,是否可以在不加载整个文件的情况下从hdf5文件中读取给定的行集?我有相当大的hdf5文件和大量的数据集,下面是一个我想减少时间和内存使用的示例: #! /usr/bin/env python import numpy as np import h5py infile = 'field1.87.hdf5' f = h5py.File(infile,'r') group = f['Data'] mdisk = group['mdisk'].value val = 2.*pow(10.,10.

是否可以在不加载整个文件的情况下从hdf5文件中读取给定的行集?我有相当大的hdf5文件和大量的数据集,下面是一个我想减少时间和内存使用的示例:

#! /usr/bin/env python

import numpy as np
import h5py

infile = 'field1.87.hdf5'
f = h5py.File(infile,'r')
group = f['Data']

mdisk = group['mdisk'].value

val = 2.*pow(10.,10.)
ind = np.where(mdisk>val)[0]

m = group['mcold'][ind]
print m
ind
不提供连续的行,而是提供分散的行

上面的代码失败了,但它遵循了切片hdf5数据集的标准方法。我收到的错误消息是:

Traceback (most recent call last):
  File "./read_rows.py", line 17, in <module>
    m = group['mcold'][ind]
  File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/dataset.py", line 425, in __getitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 71, in select
    sel[arg]
  File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 209, in __getitem__
    raise TypeError("PointSelection __getitem__ only works with bool arrays")
TypeError: PointSelection __getitem__ only works with bool arrays
回溯(最近一次呼叫最后一次):
文件“/read_rows.py”,第17行,在
m=组['mcold'][ind]
文件“/cosma/local/Python/2.7.3/lib/python2.7/site packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/dataset.py”,第425行,在__
selection=sel.select(self.shape,args,dsid=self.id)
文件“/cosma/local/Python/2.7.3/lib/python2.7/site packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py”,第71行,选择
sel[arg]
文件“/cosma/local/Python/2.7.3/lib/python2.7/site packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py”,第209行,在__
raise TypeError(“PointSelection\uuuu getitem\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
TypeError:PointSelection\uuu getitem\uuuuuuuu仅适用于布尔数组

我有一个样本h5py文件,其中包含:

data = f['data']
#  <HDF5 dataset "data": shape (3, 6), type "<i4">
# is arange(18).reshape(3,6)
ind=np.where(data[:]%2)[0]
# array([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype=int32)
data[ind]  # getitem only works with boolean arrays error
data[ind.tolist()] # can't read data (Dataset: Read failed) error
具有适当维度切片的数组也是如此:

In [157]: data[ind[[0,3,6]],:]
Out[157]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])
In [165]: f['data'][:2,np.array([0,3,5])]
Out[165]: 
array([[ 0,  3,  5],
       [ 6,  9, 11]])
In [166]: f['data'][[0,1],np.array([0,3,5])]  
# errror about only one indexing array allowed
因此,如果索引是正确的-唯一的值,并且匹配数组维度,那么它应该可以工作


我的简单示例没有测试数组的加载量。文档听起来好像是从文件中选择了元素,而没有将整个数组加载到内存中。

说它“失败”,但没有显示错误消息,或者是什么错误,这是一个很大的禁忌。您正在将整个
mdisk
数组加载到内存中。我必须深入研究文档,以确定加载了多少
mcold
。这可能取决于
ind
是一个紧凑的片还是分散在数组中的值。是的!谢谢这实际上是一个匹配数组维度的问题。在上面的示例代码中,通过:ind=(mdisk>val)更改where语句就足够了。当然,如果在数组中转换h5文件,选择行很容易,但问题是:我们可以在不创建数组的情况下删除行吗?在我的例子中,它非常有用,因为我无法将整个数组加载到内存中。所以我想直接从h5文件中提取行。谢谢lot@Tbertin,my
data
是数据集,而不是加载的数组。因此,我确实演示了如何加载选定的行。切片索引也可以工作。即使数据是一个数据集,只要您写入数据[索引],您就创建了一个数组,并将所有选定的数据加载到内存中rows@Tbertin,那么
删除
提取
是指更改文件本身的数据吗?如果是这样,您需要查看底层的
HDF5
代码,而不是python接口。
In [157]: data[ind[[0,3,6]],:]
Out[157]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])
In [165]: f['data'][:2,np.array([0,3,5])]
Out[165]: 
array([[ 0,  3,  5],
       [ 6,  9, 11]])
In [166]: f['data'][[0,1],np.array([0,3,5])]  
# errror about only one indexing array allowed