Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从损坏的文件中恢复数据_Python_Hdf5_H5py - Fatal编程技术网

Python 从损坏的文件中恢复数据

Python 从损坏的文件中恢复数据,python,hdf5,h5py,Python,Hdf5,H5py,我有一个HDF5文件由于某种原因被破坏了。我正在尝试检索文件中基本正常的部分。我可以从不包含损坏字段的组中读取所有数据集。但是,我无法从同时具有损坏数据集的组中读取任何未损坏的数据集 然而有趣的是,我可以使用HDFView轻松地读取这些数据集。也就是说,我可以打开它们,找到所有的数值。使用HDFView,我只能读取损坏的数据集 我的问题是我如何利用这一点,并检索尽可能多的数据 使用h5py读取时: Traceback (most recent call last): File "repair

我有一个HDF5文件由于某种原因被破坏了。我正在尝试检索文件中基本正常的部分。我可以从不包含损坏字段的组中读取所有数据集。但是,我无法从同时具有损坏数据集的组中读取任何未损坏的数据集

然而有趣的是,我可以使用HDFView轻松地读取这些数据集。也就是说,我可以打开它们,找到所有的数值。使用HDFView,我只能读取损坏的数据集

我的问题是我如何利用这一点,并检索尽可能多的数据

使用h5py读取时:

Traceback (most recent call last):
  File "repair.py", line 44, in <module>
    print(data['/dt_yield/000000'][...])
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/usr/local/lib/python3.6/site-packages/h5py/_hl/group.py", line 167, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (bad heap free list)'
恢复脚本(使用h5py) 到目前为止,我已经实现了至少恢复h5py可以直接读取的所有内容:

import numpy as np
import h5py, os, time

def getdatasets(key,archive):

  if key[-1] != '/': key += '/'

  out = []

  for name in archive[key]:

    path = key + name

    if isinstance(archive[path], h5py.Dataset):
      out += [path]
    else:
      try   : out += getdatasets(path,archive)
      except: pass

  return out

data  = h5py.File('data.hdf5' ,'r')
fixed = h5py.File('fixed.hdf5','w')

datasets = getdatasets('/',data)

groups = list(set([i[::-1].split('/',1)[1][::-1] for i in datasets]))
groups = [i for i in groups if len(i)>0]

idx    = np.argsort(np.array([len(i.split('/')) for i in groups]))
groups = [groups[i] for i in idx]

for group in groups:
  fixed.create_group(group)

for path in datasets:

  # - check path
  if path not in data: continue

  # - try reading
  try   : data[path]
  except: continue

  # - get group name
  group = path[::-1].split('/',1)[1][::-1]

  # - minimum group name
  if len(group) == 0: group = '/'

  # - copy data
  data.copy(path, fixed[group])

我找到了一种简单的方法来恢复所有不包含断开节点的顶级组。可以通过递归调用简单地扩展到较低级别的组

import h5py as h5

def RecoverFile( f1, f2 ):
    """  recover read-open HDF5 file f1 to write-open HDF5 file f2  """
    names = []
    f1.visit(names.append)
    for n in names:
        try:
            f2.create_dataset( n, data=f1[n][()] )
        except:
            pass



with h5.File( file_broken, 'r' ) as fb:
    with h5.File( file_recover, 'w' ) as fr:
        for key in fb.keys():
            try:
                fr.create_dataset( key, data=fb[key][()] )
            except:
                try:
                    fr.create_group(key)
                    RecoverFile( fb[key], fr[key] )
                except:
                    fr.__delitem__(key)

也许这很愚蠢,但您可以从HDFView导出数据(在数据集>导出中单击鼠标右键)。根据数据集的数量,它可能会很乏味,但您可以选择它。@pablo\u worker。谢谢,是的,很好。我正在寻找一个自动化的工具。谢谢你的回答。我认为这可能需要推广。如果数据集存储在根目录上怎么办:如果将
/a
作为数据集怎么办?另外,
.value
方法似乎已被弃用。感谢您的建议!我相应地改变了答案。太好了!仅供参考,现在也有一个命令行脚本来执行此操作:
import h5py as h5

def RecoverFile( f1, f2 ):
    """  recover read-open HDF5 file f1 to write-open HDF5 file f2  """
    names = []
    f1.visit(names.append)
    for n in names:
        try:
            f2.create_dataset( n, data=f1[n][()] )
        except:
            pass



with h5.File( file_broken, 'r' ) as fb:
    with h5.File( file_recover, 'w' ) as fr:
        for key in fb.keys():
            try:
                fr.create_dataset( key, data=fb[key][()] )
            except:
                try:
                    fr.create_group(key)
                    RecoverFile( fb[key], fr[key] )
                except:
                    fr.__delitem__(key)