Python 如果行中的特定列上有数据,则提取这些列上的数据

Python 如果行中的特定列上有数据,则提取这些列上的数据,python,parsing,Python,Parsing,我有一个文件,其中包含如下数据行,我需要拉出74-79和122-124处的字符,有些行在74-79处没有任何字符,我想跳过这些行 import re def main(): file=open("CCDATA.TXT","r") lines =file.readlines() file.close() for line in lines: lines=re.sub(r" +", " ", line)

我有一个文件,其中包含如下数据行,我需要拉出74-79和122-124处的字符,有些行在74-79处没有任何字符,我想跳过这些行

import re
    def main():
        file=open("CCDATA.TXT","r")
        lines =file.readlines()
        file.close()

        for line in lines:
            lines=re.sub(r" +", " ", line)
            print(lines)


    main()
编辑:感谢您的评论,很显然,它应该是:

for line in file:
     first = line[74:79]
     second = line[122:124] 
     if set(first) != set(' ') and set(second) != set(' '):
          do_something_with(first, second)
简短答复:

就拿
行[74:79]
之类的吧。由于输入中的行长度始终为230个字符,因此永远不会有
索引器
,因此您需要使用
isspace()检查结果是否都是空白:

解析如下:

def parse_line(line,fields,end):
    result={}
    #for whitespace validation
    # prev_ecol=0
    for fname,(scol,ecol) in format.iteritems():
        #optionally validate delimiting whitespace
        # assert prev_ecol==scol or isspace(line[prev_ecol,scol])
        #lines in the input are always `end' symbols wide, so IndexError will never happen for a valid input
        field=line[scol:ecol]
        #optionally do conversion and such, this is completely up to you
        field=field.rstrip(' ')
        if not field: field=None
        result[fname]=field
        #for whitespace validation
        # prev_ecol=ecol
    #optionally validate line end
    # assert ecol==end or isspace(line[ecol:end])
剩下的就是跳过字段为空的行:

for line in lines:
    data = parse_line(line,fields,line_end)
    if any(data[fname] is None for fname in ('num2','id4')): continue

    #handle the data  

您的代码格式奇怪可能是重复的,因此,您希望检索每行相对索引74-79和122-124处的子字符串?也许我不明白,我正在查看编辑器中的行,甚至是一些具有74-79的行,在124 posn中没有数据,在您的示例中,在122 posn中有一个没有数据。只有第1、2、4行有实际的数据122-124。他的行总是230个字符长,所以永远不会有
索引器。噢,曲解了demport re def main():file=open(“CCDATA.TXT”,“r”)line=file.readlines()file.close()对于行中的行:try:first=行[69:74]second=行[117:119],索引器除外:continue#skip line else:print(first,second)main()更正索引器无法删除带空格的行
field=line[74:79]
<...>
if isspace(field): continue
fields=[
    ("id1",(0,39)),
    ("cname_text":(40,73)),
    ("num2":(74:79)),
    ("num3":(96,105)),
    #whether to introduce a separate field at [122:125]
    # or parse "id4" further after getting it is up to you.
    # I'd suggest you follow the official format spec.
    ("id4":(106,130)),
    ("num5":(134,168))
]
line_end=230
def parse_line(line,fields,end):
    result={}
    #for whitespace validation
    # prev_ecol=0
    for fname,(scol,ecol) in format.iteritems():
        #optionally validate delimiting whitespace
        # assert prev_ecol==scol or isspace(line[prev_ecol,scol])
        #lines in the input are always `end' symbols wide, so IndexError will never happen for a valid input
        field=line[scol:ecol]
        #optionally do conversion and such, this is completely up to you
        field=field.rstrip(' ')
        if not field: field=None
        result[fname]=field
        #for whitespace validation
        # prev_ecol=ecol
    #optionally validate line end
    # assert ecol==end or isspace(line[ecol:end])
for line in lines:
    data = parse_line(line,fields,line_end)
    if any(data[fname] is None for fname in ('num2','id4')): continue

    #handle the data