Python：需要关于从文本文件读取数据块的提示吗_Python_Parsing_Loops

Python：需要关于从文本文件读取数据块的提示吗

python parsing loops

Python：需要关于从文本文件读取数据块的提示吗,python,parsing,loops,Python,Parsing,Loops,我有一个包含如下数据的文件： # 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148 # 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700 ... 893.270609 1092.179289 184.692319 907.682255 1048.809187 112.538457 ... # 0 877.347791 854.

我有一个包含如下数据的文件：

# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887

我想用以下方式只读注释行之间的行：我将两个相邻注释之间的所有行读入某个数组（不保存到文件中），然后处理它，然后将下一个块读入数组，依此类推

我设法让它读了一个街区：

def main():
    sourceFile = 'test.asc'
    print 'Extracting points ...'
    extF = open(sourceFile, 'r')
    block, cursPos = readBlock(extF)
    extF.close()
    print 'Finished extraction'

def readBlock(extF):
    countPnts = 0
    extBlock = []
    line = extF.readline()
    while not line.startswith('#'):
        extPnt = Point(*[float(j) for j in line.split()])
        countPnts += 1
        extBlock.append(extPnt)
        line = extF.readline()

    cursPos = extF.tell()
    print 'Points:', countPnts
    print 'Cursor position:', cursPos
    return extBlock, cursPos

它可以完美地工作，但只适用于一个数据块。我不能让它在从一个块到另一个块的注释行之间迭代。我在考虑光标的位置，但没有意识到这一点。请给我一些建议。多谢各位

更新我实现了MattH的想法如下：

def blocks(seq):
    buff = []
    for line in seq:
        if line.startswith('#'):
            if buff:
                #yield "".join(buff)
                buff = []
        else:
            # I need to make those numbers float
            line_spl = line.split()
            pnt = [float(line_spl[k]) for k in range(len(line_spl))]
            #print pnt
            buff.append(Point(*pnt))
    if buff:
        yield "".join(buff)

然后，如果我运行它：

for block in blocks(extF.readlines()):
    print 'p'

虽然

print'p'

在

for

-循环中，但我只有一个空窗口。因此，有几个问题：

这是什么意思

if buff:
    yield "".join(buff)

是吗？当我评论它时，没有任何改变

为什么

for

-循环中的命令不起作用

这个函数是生成器，所以我无法访问以前处理过的行，是吗

解决方案

我用MattH和Ashwini Chaudhari的想法自己做到了。最后，我得到了这个：

def readBlock(extF):
    countPnts = 0
    extBlock = []
    line = extF.readline()
    if line.startswith('#'):
        line = extF.readline()
    else:
        while not line.startswith('#'):
            extPnt = Point(*[float(j) for j in line.split()])
            countPnts += 1
            extBlock.append(extPnt)
            line = extF.readline()

    return extBlock, countPnts

并使用以下工具运行它：

while extF.readline():
    block, pntNum = readBlock(extF)

它完全符合我的需要

谢谢大家。

这里有两个简单的生成器，一个生成所有非注释块，另一个只生成注释之间的非注释块。针对两种不同的可能性进行了更新，并更新为在同一函数中进行行拆分和连接，以实现一致性

sample = """Don't yield this
# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887
Don't yield this either"""

def blocks1(text):
  """All non-comment blocks"""
  buff = []
  for line in text.split('\n'):
    if line.startswith('#'):
      if buff:
        yield "\n".join(buff)
        buff = []
    else:
      buff.append(line)
  if buff:
    yield "\n".join(buff)

def blocks2(text):
  """Only non-comment blocks *between* comments"""
  buff = None
  for line in text.split('\n'):
    if line.startswith('#'):
      if buff is None:
        buff = []
      if buff:
        yield "\n".join(buff)
        buff = []
    else:
      if buff is not None:
        buff.append(line)

for block in blocks2(sample):
  print "Block:\n%s" % (block,)

产生：

Block:
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
Block:
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...

data.txt：

123456
1234
# 0 867.691994 855.172889 279.230411 -78.951239 55.994189 -164.824148
# 0 872.477810 854.828159 279.690170 -78.950558 55.994391 -164.823700
...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
# 0 877.347791 854.481104 280.214892 -78.949869 55.994596 -164.823240
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...
# 0 882.216053 854.135168 280.745489 -78.948443 55.996206 -164.821887
1234
12345

节目：

with open('data.txt') as f:
    lines=[x.strip() for x in f if x.strip()]
    for i,x in enumerate(lines):  #loop to find the first comment line
        if x.startswith('#'):
            ind=i
            break
    for i,x in enumerate(lines[::-1]): #loop to find the first comment line from the end
        if x.startswith('#'):
            ind1=i
            break
    for x in lines[ind+1:-ind1-1]:
        if not x.startswith('#'):
            print x

输出：

...
893.270609 1092.179289 184.692319
907.682255 1048.809187 112.538457
...
...
893.243290 1091.395104 184.726720
907.682255 1048.809187 112.538457
...

如果在第一个注释和/或最后一个注释下面有一行，那么这个程序也会把这些行看作块，而OP只需要两个邻域注释之间的行有趣点，我添加了另一个generate，它不会产生没有前导或尾随注释的结果。@MattH我不太明白这篇文章的作用：

如果buff:yield“\n”。join（buff）

。如果

buff

的布尔值为

True

（在这段代码中，这大致相当于

buff不是None，len（buff）>0

），然后

产生buff
的内容与\n
连接在一起。在这段代码中，buff要么是None
，要么是一个包含零个或多个字符串的列表。@MattH为什么需要它？当我删除它时，没有任何变化。OP要求一次处理一个块的非注释块，您的解决方案没有单独提供它们。可能，生成器不会为我工作，因为我需要访问块的所有行。好吧，如果您高兴的话…我仍然建议查看，这样，您就可以找到“#”的位置了。
：然后，只需读取所需数组的一部分即可…@PierreGM谢谢您的评论。这对我来说很复杂。我更喜欢一个可以理解的解决方案，即使它不是最明智的。也许等我学好Python的时候。。。