Python genfromtxt（）中的NumPy数据类型问题，将字符串作为bytestring读取_Python_Numpy_Genfromtxt

Python genfromtxt（）中的NumPy数据类型问题，将字符串作为bytestring读取

python numpy

Python genfromtxt（）中的NumPy数据类型问题，将字符串作为bytestring读取,python,numpy,genfromtxt,Python,Numpy,Genfromtxt,我想将标准ascii csv文件读入numpy，它由浮点和字符串组成例如：无论我尝试了什么，生成的数组都是例如： all_data = np.genfromtxt(csv_file, dtype=None, delimiter=',') [(b'ZINC00043096', b'C.3', b'C1', -0.154, b'methyl') (b'ZINC00043096', b'C.3', b'C2', 0.0638, b'methylene') (b'ZINC00043096'

我想将标准ascii csv文件读入numpy，它由浮点和字符串组成

例如：

无论我尝试了什么，生成的数组都是

例如：

all_data = np.genfromtxt(csv_file, dtype=None, delimiter=',')


[(b'ZINC00043096', b'C.3', b'C1', -0.154, b'methyl')
 (b'ZINC00043096', b'C.3', b'C2', 0.0638, b'methylene')
 (b'ZINC00043096', b'C.3', b'C4', 0.0669, b'methylene')

但是，我想为字节-字符串转换保存一个步骤，并且想知道如何将字符串列作为常规字符串直接读取

我从numpy.genfromtxt（）文档中尝试了一些东西，例如，

dtype='S，S，S，f，S'

或

dtype='a25，a25，a25，f，a25'

，但在这里没有什么真正的帮助

很抱歉，但我想我只是不明白数据类型转换是如何工作的……如果您能在这里给我一些提示，那就太好了

谢谢

或者可以使用

usecols

参数选择已知为字符串的列：

np.genfromtxt(csv_file, dtype=None, delimiter=',',usecols=(0,1,2,4))

在Python2.7中

array([('ZINC00043096', 'C.3', 'C1', -0.154, 'methyl'),
       ('ZINC00043096', 'C.3', 'C2', 0.0638, 'methylene'),
       ('ZINC00043096', 'C.3', 'C4', 0.0669, 'methylene'),
       ('ZINC00090377', 'C.3', 'C7', 0.207, 'methylene')], 
      dtype=[('f0', 'S12'), ('f1', 'S3'), ('f2', 'S2'), ('f3', '<f8'), ('f4', 'S9')])

生产

array([('ZINC00043096', 'C.3', 'C1', -0.154, 'methyl'),
       ('ZINC00043096', 'C.3', 'C2', 0.0638, 'methylene'),
       ('ZINC00043096', 'C.3', 'C4', 0.0669, 'methylene'),
       ('ZINC00090377', 'C.3', 'C7', 0.207, 'methylene')], 
      dtype=[('f0', '<U12'), ('f1', '<U3'), ('f2', '<U2'), ('f3', '<f8'), ('f4', '<U9')])

all_data_

为448字节，因为

numpy

为每个unicode字符分配4个字节。每个

U4

项的长度为16字节

Python3.6中v1.14的更改：

all_data = np.genfromtxt('csv_file.csv', delimiter=',', dtype='unicode')

很好用。

为什么你这么讨厌np.bytes？旁白：根据我的经验，当人们想把文本和数字都放入一个numpy数组时，通常最好使用

数据帧

@zhangxaochen-如果我没记错的话（目前无法在python3上测试），将列作为字节将不允许使用numpy的矢量化字符串操作。不过，我可能记错了。工作起来很有魅力！这要求您事先知道数据类型，而python2.7解决方案允许您指定

dtype=None

。在python3中是否有任何类似的行为将强制转换为unicode？

dtype=None

在py2和py3中创建bytestring<代码>类似的行为可以用两种方式解释-作为bytestring，或者作为默认str。@kadrlica 1.14版为我们提供了更大的灵活性，带有

编码

参数。见最近

array([(b'ZINC00043096', b'C.3', b'C1', -0.154, b'methyl'),
       (b'ZINC00043096', b'C.3', b'C2', 0.0638, b'methylene'),
       (b'ZINC00043096', b'C.3', b'C4', 0.0669, b'methylene'),
       (b'ZINC00090377', b'C.3', b'C7', 0.207, b'methylene')], 
      dtype=[('f0', 'S12'), ('f1', 'S3'), ('f2', 'S2'), ('f3', '<f8'), ('f4', 'S9')])

alttype = np.dtype([('f0', 'U12'), ('f1', 'U3'), ('f2', 'U2'), ('f3', '<f8'), ('f4', 'U9')])
all_data_u = np.genfromtxt(csv_file, dtype=alttype, delimiter=',')

array([('ZINC00043096', 'C.3', 'C1', -0.154, 'methyl'),
       ('ZINC00043096', 'C.3', 'C2', 0.0638, 'methylene'),
       ('ZINC00043096', 'C.3', 'C4', 0.0669, 'methylene'),
       ('ZINC00090377', 'C.3', 'C7', 0.207, 'methylene')], 
      dtype=[('f0', '<U12'), ('f1', '<U3'), ('f2', '<U2'), ('f3', '<f8'), ('f4', '<U9')])

(u'ZINC00043096', u'C.3', u'C1', -0.154, u'methyl')

all_data = np.genfromtxt('csv_file.csv', delimiter=',', dtype='unicode')