Python 使用numpy.loadtxt加载包含浮点和字符串的文本文件_Python_Python 2.7_Python 3.x_Numpy

Python 使用numpy.loadtxt加载包含浮点和字符串的文本文件

python python-2.7 python-3.x numpy

Python 使用numpy.loadtxt加载包含浮点和字符串的文本文件,python,python-2.7,python-3.x,numpy,Python,Python 2.7,Python 3.x,Numpy,我有一个文本文件，data.txt，其中包含： 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 5.8,2.7,4.1,1.0,Iris-versicolor 6.2,2.2,4.5,1.5,Iris-versicolor 6.4,3.1,5.5,1.8,Iris-virginica 6.0,3.0,4.8,1.8,Iris-virginica 如何使用numpy.loadtxt（）加载此数据，以便在加载后获得一个numpy数组，

我有一个文本文件，

data.txt

，其中包含：

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica

如何使用

numpy.loadtxt（）

加载此数据，以便在加载后获得一个numpy数组，例如

[['5.1''3.5''1.4''0.2''Iris setosa']['4.9''3.0''1.4''0.2''Iris setosa']…]

我试过了

np.loadtxt(open("data.txt"), 'r',
           dtype={
               'names': (
                   'sepal length', 'sepal width', 'petal length',
                   'petal width', 'label'),
               'formats': (
                   np.float, np.float, np.float, np.float, np.str)},
           delimiter= ',', skiprows=0)

如果使用，可以指定

dtype=None

，这将告诉

genfromtxt

智能地猜测每列的数据类型。最方便的是，它免除了指定字符串列所需字节数的麻烦。（通过指定例如

np.str

来省略字节数是不起作用的。）

主要区别在于将

np.str

更改为

|S15

（一个15字节的字符串）

还要注意

open（“data.txt”），'r'

应该是

open（“data.txt”），'r'）

。但是由于

np.loadtxt

可以接受文件名，您根本不需要使用

open

。

似乎将数字和文本放在一起给您带来了很多麻烦-如果您最终决定将它们分开，我的解决方法是：

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3])
labels = np.loadtxt('data', delimiter=',', usecols=[4])

对于标签，我认为需要添加

dtype=np.str

，以使其正常工作。否则，您将得到<代码> ValueError：不能将字符串转换为浮点：Iris setosa < /C>或类似的东西。NP.GENROFRTMTXT通常更健壮，并且从我最初回答这个问题并考虑看熊猫以来，已经为我节省了大量的心痛。你再也不会使用.loadtxt或.genfromtxt了！当我尝试这样做时，我的字符串被作为字节文本读入，比如b'ADT1_'。我必须手动转换它们，还是有办法让它像你的例子那样以字符串形式读取？@mattgabor：如果你将

dtype=None

更改为

dtype=['For me loadtxt（）在文本文件包含'#'时不起作用。出于某种原因，它返回了错误的列数。我用其他字符替换了'#'，''，然后使用line.replace（'#'，''）@vlad:genfromtxt
和loadtxt
有一个comments
参数，默认设置为'#'
。注释字符后的所有字符都将被丢弃。这可能是解析器找到错误列数的原因。您可以通过将comments
设置为som来避免此问题除'#'以外的其他内容。
np.loadtxt("data.txt",
   dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'),
          'formats': (np.float, np.float, np.float, np.float, '|S15')},
   delimiter=',', skiprows=0)

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3])
labels = np.loadtxt('data', delimiter=',', usecols=[4])