Python 如果行中的特定列上有数据，则提取这些列上的数据_Python_Parsing

Python 如果行中的特定列上有数据，则提取这些列上的数据

python parsing

Python 如果行中的特定列上有数据，则提取这些列上的数据,python,parsing,Python,Parsing,我有一个文件，其中包含如下数据行，我需要拉出74-79和122-124处的字符，有些行在74-79处没有任何字符，我想跳过这些行 import re def main(): file=open("CCDATA.TXT","r") lines =file.readlines() file.close() for line in lines: lines=re.sub(r" +", " ", line)

我有一个文件，其中包含如下数据行，我需要拉出74-79和122-124处的字符，有些行在74-79处没有任何字符，我想跳过这些行

import re
    def main():
        file=open("CCDATA.TXT","r")
        lines =file.readlines()
        file.close()

        for line in lines:
            lines=re.sub(r" +", " ", line)
            print(lines)


    main()

编辑：感谢您的评论，很显然，它应该是：

for line in file:
     first = line[74:79]
     second = line[122:124] 
     if set(first) != set(' ') and set(second) != set(' '):
          do_something_with(first, second)

简短答复:

就拿

行[74:79]

之类的吧。由于输入中的行长度始终为230个字符，因此永远不会有

索引器

，因此您需要使用

isspace（）检查结果是否都是空白：
解析如下：
def parse_line(line,fields,end):
    result={}
    #for whitespace validation
    # prev_ecol=0
    for fname,(scol,ecol) in format.iteritems():
        #optionally validate delimiting whitespace
        # assert prev_ecol==scol or isspace(line[prev_ecol,scol])
        #lines in the input are always `end' symbols wide, so IndexError will never happen for a valid input
        field=line[scol:ecol]
        #optionally do conversion and such, this is completely up to you
        field=field.rstrip(' ')
        if not field: field=None
        result[fname]=field
        #for whitespace validation
        # prev_ecol=ecol
    #optionally validate line end
    # assert ecol==end or isspace(line[ecol:end])

剩下的就是跳过字段为空的行：
for line in lines:
    data = parse_line(line,fields,line_end)
    if any(data[fname] is None for fname in ('num2','id4')): continue

    #handle the data  

您的代码格式奇怪可能是重复的，因此，您希望检索每行相对索引74-79和122-124处的子字符串？也许我不明白，我正在查看编辑器中的行，甚至是一些具有74-79的行，在124 posn中没有数据，在您的示例中，在122 posn中有一个没有数据。只有第1、2、4行有实际的数据122-124。他的行总是230个字符长，所以永远不会有索引器。噢，曲解了demport re def main（）：file=open（“CCDATA.TXT”，“r”）line=file.readlines（）file.close（）对于行中的行：try:first=行[69:74]second=行[117:119]，索引器除外：continue#skip line else:print（first，second）main（）更正索引器无法删除带空格的行
field=line[74:79]
<...>
if isspace(field): continue

fields=[
    ("id1",(0,39)),
    ("cname_text":(40,73)),
    ("num2":(74:79)),
    ("num3":(96,105)),
    #whether to introduce a separate field at [122:125]
    # or parse "id4" further after getting it is up to you.
    # I'd suggest you follow the official format spec.
    ("id4":(106,130)),
    ("num5":(134,168))
]
line_end=230

def parse_line(line,fields,end):
    result={}
    #for whitespace validation
    # prev_ecol=0
    for fname,(scol,ecol) in format.iteritems():
        #optionally validate delimiting whitespace
        # assert prev_ecol==scol or isspace(line[prev_ecol,scol])
        #lines in the input are always `end' symbols wide, so IndexError will never happen for a valid input
        field=line[scol:ecol]
        #optionally do conversion and such, this is completely up to you
        field=field.rstrip(' ')
        if not field: field=None
        result[fname]=field
        #for whitespace validation
        # prev_ecol=ecol
    #optionally validate line end
    # assert ecol==end or isspace(line[ecol:end])

for line in lines:
    data = parse_line(line,fields,line_end)
    if any(data[fname] is None for fname in ('num2','id4')): continue

    #handle the data