Python 通过使用numpy递归跳过行来读取文本文件_Python_Arrays_Numpy

Python 通过使用numpy递归跳过行来读取文本文件

python arrays numpy

Python 通过使用numpy递归跳过行来读取文本文件,python,arrays,numpy,Python,Arrays,Numpy,我有一个像这样的数据文件 # some text # some text # some text 100000 3 4032 1 0.0125 101.27 293.832 2 0.0375 108.624 292.285 3 0.0625 84.13 291.859 200000 3 4032 4 0.0125 101.27 293.832 5 0.0375 108.624 292.285 6 0.0625 84.13 291.859 300000 3 4032 7 0.0125

我有一个像这样的数据文件

# some text
# some text
# some text
100000 3 4032
 1 0.0125 101.27 293.832
 2 0.0375 108.624 292.285
 3 0.0625 84.13 291.859
200000 3 4032
 4 0.0125 101.27 293.832
 5 0.0375 108.624 292.285
 6 0.0625 84.13 291.859
300000 3 4032
 7 0.0125 101.27 293.832
 8 0.0375 108.624 292.285
 9 0.0625 84.13 291.859
........

我想把这些数据读入一个数组，以便进一步处理。但是，我只需要有四列的数据。因此，要么跳过三列数据，要么将它们存储在不同的数组中。因为这个数据文件很大，而且重复的方式相同，所以如果我能一次读取这个文件，会更容易。我尝试了numpy.genfromtxt（文件）和itertools.islice（文件，4,7），但是找不到将所有四列数据存储到单个数组的方法（因为中间有三列数据）。任何有关这方面的帮助都将不胜感激。谢谢

import itertools as IT   
import numpy as np

arr=[]

with open('data.txt', 'rb') as f:    
     ln = IT.islice(f, 4, 7)   
     arr.append(np.genfromtxt(ln))     
     ln = IT.islice(f, 1, 4)   
     arr.append(np.genfromtxt(ln))
     ln = IT.islice(f, 1, 4)   
     arr.append(np.genfromtxt(ln))
print arr

这段代码可以工作，但是我的数据文件比上面的示例大得多。因此，我不想重复这段代码，因为它不会有效率。有没有更优雅的方法来实现这一点

这似乎是你想要的

from io import StringIO
dataFile = StringIO('''\
# some text
# some text
# some text
100000 3 4032
 1 0.0125 101.27 293.832
 2 0.0375 108.624 292.285
 3 0.0625 84.13 291.859
200000 3 4032
 4 0.0125 101.27 293.832
 5 0.0375 108.624 292.285
 6 0.0625 84.13 291.859
300000 3 4032
 7 0.0125 101.27 293.832
 8 0.0375 108.624 292.285
 9 0.0625 84.13 291.859''')

def wantedLines():
    count = -1
    with dataFile as data:
        while True:
            line = data.readline()
            if line: line = line.strip()
            else: break
            if line.startswith('#'): continue
            else:
                count +=1
                if count % 4==0: continue
                else: yield line.encode()

import numpy as np

result = np.genfromtxt(wantedLines())
print (result)

结果

：

[[  1.00000000e+00   1.25000000e-02   1.01270000e+02   2.93832000e+02]
 [  2.00000000e+00   3.75000000e-02   1.08624000e+02   2.92285000e+02]
 [  3.00000000e+00   6.25000000e-02   8.41300000e+01   2.91859000e+02]
 [  4.00000000e+00   1.25000000e-02   1.01270000e+02   2.93832000e+02]
 [  5.00000000e+00   3.75000000e-02   1.08624000e+02   2.92285000e+02]
 [  6.00000000e+00   6.25000000e-02   8.41300000e+01   2.91859000e+02]
 [  7.00000000e+00   1.25000000e-02   1.01270000e+02   2.93832000e+02]
 [  8.00000000e+00   3.75000000e-02   1.08624000e+02   2.92285000e+02]
 [  9.00000000e+00   6.25000000e-02   8.41300000e+01   2.91859000e+02]]

向我们展示您的代码，并描述它如何不符合您的期望。请在问题中以文本形式插入数据的相关部分，而不是链接到某个外部图像。您能否编写一个文件读取器，读取所有行，但只传递4列行？有点像读取文件但跳过注释行

genfromtxt

对任何给它行的东西都很满意——一个文件，一个生成器，一个行列表。我编辑了这个问题谢谢比尔。我感谢你的帮助。