Python 如果行中的特定列上有数据,则提取这些列上的数据
我有一个文件,其中包含如下数据行,我需要拉出74-79和122-124处的字符,有些行在74-79处没有任何字符,我想跳过这些行Python 如果行中的特定列上有数据,则提取这些列上的数据,python,parsing,Python,Parsing,我有一个文件,其中包含如下数据行,我需要拉出74-79和122-124处的字符,有些行在74-79处没有任何字符,我想跳过这些行 import re def main(): file=open("CCDATA.TXT","r") lines =file.readlines() file.close() for line in lines: lines=re.sub(r" +", " ", line)
import re
def main():
file=open("CCDATA.TXT","r")
lines =file.readlines()
file.close()
for line in lines:
lines=re.sub(r" +", " ", line)
print(lines)
main()
编辑:感谢您的评论,很显然,它应该是:
for line in file:
first = line[74:79]
second = line[122:124]
if set(first) != set(' ') and set(second) != set(' '):
do_something_with(first, second)
简短答复:
就拿行[74:79]
之类的吧。由于输入中的行长度始终为230个字符,因此永远不会有索引器
,因此您需要使用isspace()检查结果是否都是空白:
解析如下:
def parse_line(line,fields,end):
result={}
#for whitespace validation
# prev_ecol=0
for fname,(scol,ecol) in format.iteritems():
#optionally validate delimiting whitespace
# assert prev_ecol==scol or isspace(line[prev_ecol,scol])
#lines in the input are always `end' symbols wide, so IndexError will never happen for a valid input
field=line[scol:ecol]
#optionally do conversion and such, this is completely up to you
field=field.rstrip(' ')
if not field: field=None
result[fname]=field
#for whitespace validation
# prev_ecol=ecol
#optionally validate line end
# assert ecol==end or isspace(line[ecol:end])
剩下的就是跳过字段为空的行:
for line in lines:
data = parse_line(line,fields,line_end)
if any(data[fname] is None for fname in ('num2','id4')): continue
#handle the data
您的代码格式奇怪可能是重复的,因此,您希望检索每行相对索引74-79和122-124处的子字符串?也许我不明白,我正在查看编辑器中的行,甚至是一些具有74-79的行,在124 posn中没有数据,在您的示例中,在122 posn中有一个没有数据。只有第1、2、4行有实际的数据122-124。他的行总是230个字符长,所以永远不会有索引器。噢,曲解了demport re def main():file=open(“CCDATA.TXT”,“r”)line=file.readlines()file.close()对于行中的行:try:first=行[69:74]second=行[117:119],索引器除外:continue#skip line else:print(first,second)main()更正索引器无法删除带空格的行
field=line[74:79]
<...>
if isspace(field): continue
fields=[
("id1",(0,39)),
("cname_text":(40,73)),
("num2":(74:79)),
("num3":(96,105)),
#whether to introduce a separate field at [122:125]
# or parse "id4" further after getting it is up to you.
# I'd suggest you follow the official format spec.
("id4":(106,130)),
("num5":(134,168))
]
line_end=230
def parse_line(line,fields,end):
result={}
#for whitespace validation
# prev_ecol=0
for fname,(scol,ecol) in format.iteritems():
#optionally validate delimiting whitespace
# assert prev_ecol==scol or isspace(line[prev_ecol,scol])
#lines in the input are always `end' symbols wide, so IndexError will never happen for a valid input
field=line[scol:ecol]
#optionally do conversion and such, this is completely up to you
field=field.rstrip(' ')
if not field: field=None
result[fname]=field
#for whitespace validation
# prev_ecol=ecol
#optionally validate line end
# assert ecol==end or isspace(line[ecol:end])
for line in lines:
data = parse_line(line,fields,line_end)
if any(data[fname] is None for fname in ('num2','id4')): continue
#handle the data