Python 将numpy数组存储为PyTables单元格元素_Python_Arrays_Numpy_Pytables

Python 将numpy数组存储为PyTables单元格元素

python arrays numpy

Python 将numpy数组存储为PyTables单元格元素,python,arrays,numpy,pytables,Python,Arrays,Numpy,Pytables,我有4个文件，其中包含以下格式的数据：3个文件包含不同维度的numpy数组，例如20、30和25。每个文件中的记录数相同，比如10000条。第四个文件包含1000个浮点（每个文件中的数组数）。我尝试基于这些文件创建具有以下结构的表： +-----------------------------------------------------------+ | VecsFile #0 | VecsFile #1 | VecsFile #2 | FloatFile | +-------

我有4个文件，其中包含以下格式的数据：3个文件包含不同维度的numpy数组，例如20、30和25。每个文件中的记录数相同，比如10000条。第四个文件包含1000个浮点（每个文件中的数组数）。我尝试基于这些文件创建具有以下结构的表：

+-----------------------------------------------------------+
| VecsFile #0   | VecsFile #1   | VecsFile #2   | FloatFile |
+-----------------------------------------------------------+
|np.ndarray(20,)|np.ndarray(30,)|np.ndarray(25,)|   0.1     |
+-----------------------------------------------------------+
|np.ndarray(20,)|np.ndarray(30,)|np.ndarray(25,)|   0.2     |
                               ...

到，我遇到PyTables没有接收numpy数组作为单元格数据的有效类型

代码：导入表将numpy作为np导入

def create_table_def(n_files):
    table_def = dict()
    for rnum in range(n_files):
        table_def['VecsFile #'+str(rnum)] = tables.Col.from_atom(tables.Float64Atom())
    table_def['FloatFile'] = tables.Col.from_atom(tables.Float64Atom())

    return table_def

r0 = np.load('file0.npy')
r1 = np.load('file1.npy')
r2 = np.load('file2.npy')
s = np.random.rand(*r0.shape)


with tables.open_file('save.hdf', 'w') as saveFile:
    table_def = create_table_def(3)
    table = saveFile.create_table(saveFile.root, 'que_vectors', table_def)
    tablerow = table.row
    for i in range(r0.shape[0]):
        print(r0[i])
        tablerow['VecsFile #0'] = r0[i]
        tablerow['VecsFile #1'] = r1[i]
        tablerow['VecsFile #2'] = r2[i]
        tablerow['FloatFile'] = s[i]
        tablerow.append()
    table.flush()

我得到了以下回溯：

    Traceback (most recent call last):
  File "C:/scratch_6.py", line 27, in <module>
    tablerow['VecsFile #0] = r0[i]
  File "tables\tableextension.pyx", line 1591, in tables.tableextension.Row.__setitem__
TypeError: invalid type (<class 'numpy.ndarray'>) for column ``VecsFile #0``

回溯（最近一次呼叫最后一次）：
文件“C:/scratch_6.py”，第27行，在
tablerow['VecsFile#0]=r0[i]
文件“tables\tableextension.pyx”，第1591行，位于tables.tableextension.Row.\uu setitem__
TypeError:列``VecsFile#0的类型（）无效``

我做错什么了吗？或者，这种方法可以将这些向量和带有浮点的列存储为一个文件，而不将所有这些向量附加到numpy矩阵？我想在将来使用它来添加带有向量和一个浮点的行，对它们进行测距并删除它们。

我更熟悉

h5py

接口到

HDF5

，它几乎具有numpy数组的一对一映射

pytables

更复杂，但我很惊讶它在将数组保存为单元格元素时出现问题。数据帧的单元格是对象数据类型

h5py

无法保存对象数据类型数组。能否将

r0

、

r1

等保存为

h5

文件中自己的

数据集

？我知道如何使用

h5py

。我建议保存一个简单的数据帧，并使用

h5dump

（或其他通用查看器）查看文件，以了解pytables使用的结构

h5py

使HDF5组像字典一样，数据集像numpy数组。在其他SO问题中，人们在将这种数据帧保存到csv文件时遇到了困难。Pandas将数组元素转换为字符串，并将其保存为列值。Pandas load无法（轻松地）将字符串转换回数组。我错误地认为您使用的是

Pandas

，尽管

Pandas

确实使用

pytables

来编写HDF5。在任何情况下，我都看不到此处列出的

object

dtype:。

import numpy as np
import tables as tb


class NumpyTable(tb.IsDescription):
    """ define a table with cells of 84 x 84"""
    numpy_cell = tb.Float32Col(shape=(84, 84))


""" open a file and create the table """
fileh = tb.open_file('numpy_cell.h5', mode='w')
group = fileh.create_group(fileh.root, 'group')
filters = tb.Filters(complevel=5, complib='zlib')
np_table = fileh.create_table('/group', 'numpy_table', NumpyTable, "group: NumpyTable",
                              filters=filters)

""" get the last row """
row = np_table.row

""" add a row """
row['numpy_cell'] = np.zeros((84, 84), dtype=np.float32)
row.append()

""" add another row """
row['numpy_cell'] = np.ones((84, 84), dtype=np.float32)
row.append()

""" write to disk and close the file"""
np_table.flush()
fileh.close()

""" check it """
fileh = tb.open_file('numpy_cell.h5', mode='r')
assert np.allclose(
  fileh.root.group.numpy_table[0]['numpy_cell'], 
  np.zeros((84, 84), dtype=np.float32)
)
assert np.allclose(
  fileh.root.group.numpy_table[1]['numpy_cell'], 
  np.ones((84, 84), dtype=np.float32)
)
fileh.close()