Python 使用h5py强制hdf5文件的数据类型

Python 使用h5py强制hdf5文件的数据类型,python,numpy,hdf5,h5py,Python,Numpy,Hdf5,H5py,我有一个csv文件,其中有“日期”、“时间”和其他列(10个左右) 我正在尝试将其加载到hdf5文件中,并将日期和时间类型设置为“String”,而不是integer32。所以我要这么做 import h5py,numpy as np my_data = np.genfromtxt("/tmp/data.txt",delimiter=",",dtype=None,names=True) myFile="/tmp/data.h5" with h5py.File(myFile,"a") as f:

我有一个csv文件,其中有“日期”、“时间”和其他列(10个左右)

我正在尝试将其加载到hdf5文件中,并将日期和时间类型设置为“String”,而不是integer32。所以我要这么做

import h5py,numpy as np
my_data = np.genfromtxt("/tmp/data.txt",delimiter=",",dtype=None,names=True)
myFile="/tmp/data.h5"
with h5py.File(myFile,"a") as f:
  dset = f.create_dataset('foo',data=my_data)

我想将“日期”和“时间”存储为HDF5上的“字符串”类型。不是Int32。

一个简单的解决方案是在将数据写入文件之前更改
my_data
的数据类型:

newtype = np.dtype([('Date', 'S8'), ('Time', 'S8'), ('C', '<i8')])
dset2 = f.create_dataset('foo2', data=my_data.astype(newtype))
请注意,在写入之前,我仍然必须将
my_data
转换为
newtype
——h5py似乎无法处理类型转换本身:

In [15]: dset3[:] = my_data
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-15-6e62dae3d59a> in <module>()
----> 1 dset3[:] = my_data

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py/_hl/dataset.py in __setitem__(self, args, val)
    584         mspace = h5s.create_simple(mshape_pad, (h5s.UNLIMITED,)*len(mshape_pad))
    585         for fspace in selection.broadcast(mshape):
--> 586             self.id.write(mspace, fspace, val, mtype)
    587 
    588     def read_direct(self, dest, source_sel=None, dest_sel=None):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

h5py/h5d.pyx in h5py.h5d.DatasetID.write (/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()

h5py/_proxy.pyx in h5py._proxy.dset_rw (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dwrite (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()

OSError: Can't prepare for writing data (No appropriate function for conversion path)
[15]中的
:dset3[:]=my_数据
---------------------------------------------------------------------------
OSError回溯(最近一次调用上次)
在()
---->1 dset3[:]=我的_数据
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2579)()
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2538)()
/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py//\u hl/dataset.py in\uuuuuuuu setitem\uuuuu(self、args、val)
584 mspace=h5s.create_simple(mshape_pad,(h5s.UNLIMITED,)*len(mshape_pad))
585用于选择中的fspace。广播(mshape):
-->586 self.id.write(mspace、fspace、val、mtype)
587
588 def read_direct(自身、目标、源选择=无、目标选择=无):
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2579)()
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2538)()
h5py.h5d.DatasetID.write中的h5py/h5d.pyx(/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()
h5py/_proxy.pyx在h5py._proxy.dset_rw(/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()
h5py/_proxy.pyx在h5py._proxy.h5py_H5Dwrite中(/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()
OSError:无法准备写入数据(转换路径没有合适的函数)

一个简单的解决方案是在将
my_数据写入文件之前更改其数据类型:

newtype = np.dtype([('Date', 'S8'), ('Time', 'S8'), ('C', '<i8')])
dset2 = f.create_dataset('foo2', data=my_data.astype(newtype))
请注意,在写入之前,我仍然必须将
my_data
转换为
newtype
——h5py似乎无法处理类型转换本身:

In [15]: dset3[:] = my_data
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-15-6e62dae3d59a> in <module>()
----> 1 dset3[:] = my_data

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py/_hl/dataset.py in __setitem__(self, args, val)
    584         mspace = h5s.create_simple(mshape_pad, (h5s.UNLIMITED,)*len(mshape_pad))
    585         for fspace in selection.broadcast(mshape):
--> 586             self.id.write(mspace, fspace, val, mtype)
    587 
    588     def read_direct(self, dest, source_sel=None, dest_sel=None):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2579)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-aayglkf0/h5py/h5py/_objects.c:2538)()

h5py/h5d.pyx in h5py.h5d.DatasetID.write (/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()

h5py/_proxy.pyx in h5py._proxy.dset_rw (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()

h5py/_proxy.pyx in h5py._proxy.H5PY_H5Dwrite (/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()

OSError: Can't prepare for writing data (No appropriate function for conversion path)
[15]中的
:dset3[:]=my_数据
---------------------------------------------------------------------------
OSError回溯(最近一次调用上次)
在()
---->1 dset3[:]=我的_数据
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2579)()
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2538)()
/home/alistair/.venvs/core3/lib/python3.4/site-packages/h5py//\u hl/dataset.py in\uuuuuuuu setitem\uuuuu(self、args、val)
584 mspace=h5s.create_simple(mshape_pad,(h5s.UNLIMITED,)*len(mshape_pad))
585用于选择中的fspace。广播(mshape):
-->586 self.id.write(mspace、fspace、val、mtype)
587
588 def read_direct(自身、目标、源选择=无、目标选择=无):
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2579)()
h5py.\u objects.pyx中的h5py/\u objects.pyx,带有\u phil.wrapper(/tmp/pip-build-aayglkf0/h5py/h5py/\u objects.c:2538)()
h5py.h5d.DatasetID.write中的h5py/h5d.pyx(/tmp/pip-build-aayglkf0/h5py/h5py/h5d.c:3421)()
h5py/_proxy.pyx在h5py._proxy.dset_rw(/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1794)()
h5py/_proxy.pyx在h5py._proxy.h5py_H5Dwrite中(/tmp/pip-build-aayglkf0/h5py/h5py/_proxy.c:1501)()
OSError:无法准备写入数据(转换路径没有合适的函数)

我认为这是不可能的。根据:
数据集与NumPy数组非常相似。它们是数据元素的同质集合,具有不可变的数据类型和(超)矩形形状。
这意味着所有列必须具有相同的
dtype
。是否要更改在HDF5文件中存储数据的方式,或者,在从文件中读取这些列后,是否希望能够将它们转换为字符串?我希望更改存储数据的方式。我想将它们存储为字符串而不是整数。我认为这是不可能的。根据:
数据集与NumPy数组非常相似。它们是数据元素的同质集合,具有不可变的数据类型和(超)矩形形状。
这意味着所有列必须具有相同的
dtype
。是否要更改在HDF5文件中存储数据的方式,或者,在从文件中读取这些列后,是否希望能够将它们转换为字符串?我希望更改存储数据的方式。我想将它们存储为字符串而不是整数。