Python 从列表列表创建具有各种数据类型的numpy数组_Python_Arrays_Python 2.7_Numpy

Python 从列表列表创建具有各种数据类型的numpy数组

python arrays python-2.7 numpy

Python 从列表列表创建具有各种数据类型的numpy数组,python,arrays,python-2.7,numpy,Python,Arrays,Python 2.7,Numpy,我想创建一个带有列表列表的numpy数组。数据类型应为float、float、string为什么不起作用？（注意：我已经读过了）输出： [[(4.2245014868923476e-39, 7.006492321624085e-44, '') (4.2245014868923476e-39, 7.146622168056567e-44, '') (9.275530846997402e-39, 9.918384925297198e-39, '')] [(4.22450148689234

我想创建一个带有列表列表的

numpy

数组。数据类型应为

float、float、string

为什么不起作用？（注意：我已经读过了）

输出：

[[(4.2245014868923476e-39, 7.006492321624085e-44, '')
  (4.2245014868923476e-39, 7.146622168056567e-44, '')
  (9.275530846997402e-39, 9.918384925297198e-39, '')]
 [(4.2245014868923476e-39, 7.286752014489049e-44, '')
  (4.2245014868923476e-39, 7.42688186092153e-44, '')
  (9.642872831629367e-39, 0.0, '')]]

正如我在前面的回答和评论中强调的，复合数据类型的正常输入是元组列表。坦率地说，

np.array

就是这样设计的

In [308]: numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='f,f,str')
TypeError: a bytes-like object is required, not 'str'

具有元组列表和改进的

dtype

：

In [311]: numpy.array([(u'1.2', u'1.3', u'hello'), (u'1.4', u'1.5', u'hi')], dtype='f8,f8,U10')
Out[311]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])

在少数字段上循环通常比在多个记录上循环快

但是，将列表列表转换为元组列表应该不会那么昂贵

使用元组列表进行设置：

In [317]: np.array([tuple(a) for a in alist], dtype=dt)
Out[317]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])

但即使有许多行，元组转换也更快：

In [334]: arr = np.random.randint(0,100,(100000,3)).astype('U10')
In [335]: alist = arr.tolist()
In [336]: timeit np.array([tuple(a) for a in alist], dtype=dt)
93.5 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [337]: %%timeit
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: temp = np.array(alist)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
124 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

将元组理解从计时循环中拉出可以节省一些时间：

In [341]: %%timeit temp = [tuple(a) for a in alist]
     ...: np.array(temp, dtype=dt)
     ...: 
65.4 ms ± 98.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

将str阵列创建从计时中拉出：

In [342]: %%timeit temp = np.array(alist)
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
71 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

简单地从列表中创建字符串数组比元组转换要昂贵。

正如我在前面的回答和注释中强调的，复合数据类型的正常输入是元组列表。坦率地说，

np.array

就是这样设计的

In [308]: numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='f,f,str')
TypeError: a bytes-like object is required, not 'str'

具有元组列表和改进的

dtype

：

In [311]: numpy.array([(u'1.2', u'1.3', u'hello'), (u'1.4', u'1.5', u'hi')], dtype='f8,f8,U10')
Out[311]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])

在少数字段上循环通常比在多个记录上循环快

但是，将列表列表转换为元组列表应该不会那么昂贵

使用元组列表进行设置：

In [317]: np.array([tuple(a) for a in alist], dtype=dt)
Out[317]: 
array([( 1.2,  1.3, 'hello'), ( 1.4,  1.5, 'hi')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<U10')])

但即使有许多行，元组转换也更快：

In [334]: arr = np.random.randint(0,100,(100000,3)).astype('U10')
In [335]: alist = arr.tolist()
In [336]: timeit np.array([tuple(a) for a in alist], dtype=dt)
93.5 ms ± 322 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [337]: %%timeit
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: temp = np.array(alist)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
124 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

将元组理解从计时循环中拉出可以节省一些时间：

In [341]: %%timeit temp = [tuple(a) for a in alist]
     ...: np.array(temp, dtype=dt)
     ...: 
65.4 ms ± 98.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

将str阵列创建从计时中拉出：

In [342]: %%timeit temp = np.array(alist)
     ...: res = np.zeros(len(alist), dtype=dt)
     ...: for i,n in enumerate(dt.names):
     ...:     res[n] = temp[:,i]
     ...: 
71 ms ± 447 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

简单地从列表中创建字符串数组比元组转换要昂贵。

正如我在本文中所描述的，它可以使用dtype='object'

print(numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='object'))

（适用于python 3.7.1）

正如我在本文中所描述的，它适用于dtype='object'

print(numpy.array([[u'1.2', u'1.3', u'hello'], [u'1.4', u'1.5', u'hi']], dtype='object'))

（适用于python 3.7.1）

注意：我的数组实际上非常大，因此我正在寻找一种有效的解决方案尝试使用对象而不是字符串。然后不要传递列表列表，而是将它们作为元组列表传递。@Dark有多准确？可能像

np.array（[（'1.2'，'1.3'，'hello'），（'1.4'，'1.5'，'hi'）]，dtype='f，f，object'）

@Dark我注意到即使只有两列

[[u'1.2'，u'1.3']，…，][/code>，然后dtype='f，f'
已经不起作用了。注意：我的数组实际上非常大，所以我正在寻找一个有效的解决方案尝试使用对象而不是字符串。然后不要传递列表列表，而是将它们作为元组列表传递。@Dark到底有多精确？可能像np.array（[（'1.2'，'1.3'，'hello'），（'1.4'，'1.5'，'hi'）]，dtype='f，f，object'）
@Dark我注意到即使只有两列[[u'1.2'，u'1.3']，…，][/code>，那么dtype='f，f'
已经不起作用了。