Python 如何在结构化numpy数组中按列存储_Python_Arrays_Numpy_Structured Array

Python 如何在结构化numpy数组中按列存储

python arrays numpy

Python 如何在结构化numpy数组中按列存储,python,arrays,numpy,structured-array,Python,Arrays,Numpy,Structured Array,我有一个元组列表，如下所示： >>> y [(0,1,2,3,4,...,10000), ('a', 'b', 'c', 'd', ...), (3.2, 4.1, 9.2, 12., ...), ] 等。y有7个元组，其中每个元组有10000个值。给定元组的所有10000个值都是相同的数据类型，我也有这些数据类型的列表： >>>dt [('0', dtype('int64')), ('1', dtype('<U')), ('2', dtype('&l

我有一个元组列表，如下所示：

>>> y
[(0,1,2,3,4,...,10000), ('a', 'b', 'c', 'd', ...), (3.2, 4.1, 9.2, 12., ...), ]

等。

有7个元组，其中每个元组有10000个值。给定元组的所有10000个值都是相同的数据类型，我也有这些数据类型的列表：

>>>dt
[('0', dtype('int64')), ('1', dtype('<U')), ('2', dtype('<U')), ('3', dtype('int64')), ('4', dtype('<U')), ('5', dtype('float64')), ('6', dtype('<U'))]

我理解这是因为dtype表示元组中的第一个值必须是int64，第二个值必须是字符串，依此类推，对于一个具有10000个值的元组，我只有7个dtype

我如何与代码沟通，我的意思是第一个元组的所有值都是int64s，第二个元组的所有值都是字符串，等等

我还尝试将

设置为列表列表，而不是元组列表：

>>>y
[[0,1,2,3,4,...,10000], ['a', 'b', 'c', 'd', ...), ...]

In [150]: alist = [(0,1,2,3,4),tuple('abcde'),(.1,.2,.4,.6,.8)]
In [151]: alist
Out[151]: [(0, 1, 2, 3, 4), ('a', 'b', 'c', 'd', 'e'), (0.1, 0.2, 0.4, 0.6, 0.8)]
In [152]: dt = np.dtype([('0',int),('1','U3'),('2',float)])


In [153]: list(zip(*alist))
Out[153]: [(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6), (4, 'e', 0.8)]
In [154]: np.array(_, dt)
Out[154]: 
array([(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6),
       (4, 'e', 0.8)], dtype=[('0', '<i8'), ('1', '<U3'), ('2', '<f8')])

等等，我得到一个错误，原因与上面相同：

>>> x = np.array(y, dtype=dt)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'Supplier#000000001'

>x=np.array（y，dtype=dt）
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
ValueError:以10为基数的int（）的文本无效：“供应商#00000000 1”

感谢您的帮助

编辑：我的目标是让x成为一个numpy数组。

可能不是最优雅的解决方案，但列表理解是有效的：

x = [np.array(tup, dtype=typ[1]) for tup, typ in zip(y, dt)]

使用

zip*

习惯用法“转换”元组列表：

>>>y
[[0,1,2,3,4,...,10000], ['a', 'b', 'c', 'd', ...), ...]

In [150]: alist = [(0,1,2,3,4),tuple('abcde'),(.1,.2,.4,.6,.8)]
In [151]: alist
Out[151]: [(0, 1, 2, 3, 4), ('a', 'b', 'c', 'd', 'e'), (0.1, 0.2, 0.4, 0.6, 0.8)]
In [152]: dt = np.dtype([('0',int),('1','U3'),('2',float)])


In [153]: list(zip(*alist))
Out[153]: [(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6), (4, 'e', 0.8)]
In [154]: np.array(_, dt)
Out[154]: 
array([(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6),
       (4, 'e', 0.8)], dtype=[('0', '<i8'), ('1', '<U3'), ('2', '<f8')])

还有一个

numpy.lib.recfunctions

模块（单独导入），该模块具有

recarray、结构化数组

函数

如评论所述：

In [169]: np.fromiter(zip(*alist),dt)
Out[169]: 
array([(0, 'a', 0.1), (1, 'b', 0.2), (2, 'c', 0.4), (3, 'd', 0.6),
       (4, 'e', 0.8)], dtype=[('0', '<i8'), ('1', '<U3'), ('2', '<f8')])

[169]中的

：名词短语fromiter（zip（*alist），dt）
出[169]：
数组（[（0，'a'，0.1），（1，'b'，0.2），（2，'c'，0.4），（3，'d'，0.6），
（4，'e'，0.8）]，dtype=[（'0'，'不完全是我要找的，因为我希望x是一个numpy数组。我的缺点是忽略了这一点。我将编辑我的帖子来提及这一点。list（zip（*y））要将它转换为您的dt
的正确元组列表，仅用于记录，使用1D类型数据元组数组可能不是实现您所需的最简单的方法。为什么不使用2D 7*20000数组之类的东西呢？@DavidZarebski项目的一部分要求记录按列而不是按行存储。Initial首先，我尝试使用结构化数组，但发现格式不合适。为了避免list
中的np.array（list（zip）（…
也可以这样做：np.fromiter（zip（*y），dtype=dt）