Python f、 readline与f.read打印输出

Python f、 readline与f.read打印输出,python,parsing,readfile,Python,Parsing,Readfile,我是Python新手(使用Python 3.6)。我有一个read.txt文件,其中包含一家公司的信息。该文件以不同的报告特征开始 CONFORMED PERIOD REPORT: 20120928 #this is 1 line DATE OF REPORT: 20121128 #this is another line and then starts all the text about the firm..... #lot

我是Python新手(使用Python 3.6)。我有一个read.txt文件,其中包含一家公司的信息。该文件以不同的报告特征开始

CONFORMED PERIOD REPORT:             20120928 #this is 1 line
DATE OF REPORT:                      20121128 #this is another line

and then starts all the text about the firm..... #lots of lines here
我试图提取两个日期(['20120928','20121128'])以及文本中的一些字符串(即,如果字符串存在,则我需要一个'1')。最后,我想要一个向量,它给我两个日期+不同字符串的1和0,也就是说,类似:['20120928','20121128','1','0']。我的代码如下:

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = f.read()  # read the txt file
    for line in f:
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)
exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = "" # create an empty string variable out of the "for line" loop
    for line in f:
        line2 = line2 + line #append each line to the above created empty string
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)
如果我运行这段代码,我将获得['1','0',省略日期并正确读取文件,var1存在(确定'1'),var2不存在(确定'0')。我不明白的是为什么它不报告日期。重要的是,当我将line2更改为“line2=f.readline()”时,我将获得['20120928'、'20121128'、'0'、'0']。现在日期还可以,但我知道var1存在,它似乎不读取文件的其余部分?如果我省略“line2=f.read()”,它会为每一行输出一个0的向量,除了我想要的输出。我怎么能省略这些0

我期望的输出是:['20120928','20121128','1','0']

对不起打扰了。无论如何谢谢你

line2=f.read()
将整个文件读入
line2
,因此对于f:循环中的行,您的
没有什么可读的了。

f.read()
将整个文件读入变量
line2
。如果你想逐行阅读,你可以跳过
f.read()
,然后像这样迭代

with open('read.txt', 'r') as f:
    for line in f:

否则,在您将
.read()
读入
line2
之后,就没有更多的文本可以从
f
中读取了,因为它都包含在
line2
变量中。

我最终采用了以下方式:

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = f.read()  # read the txt file
    for line in f:
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)
exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = "" # create an empty string variable out of the "for line" loop
    for line in f:
        line2 = line2 + line #append each line to the above created empty string
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

到目前为止,这就是我得到的。这对我来说很有效,不过我想使用beautifulsoup可以提高代码的效率。下一步:)

最好使用f.readlines()然后使用for line-in-line,而不是按分割\n,因为这可能不会给您预期的结果。我不确定第一个代码片段是否值得作为建议提及,第二个方法显然是感谢大家的方法。我完全理解你的意思。但是,当我从原始代码中省略“line2=f..read()”时,我在“豁免”向量中获得了大量0。我用这些信息编辑了我的原始帖子。有什么建议吗?如果你想在for循环之前“跳过”两行,你可以在阅读其余内容之前添加两次
f.readline()
,以“弹出”两行。如果我在跟随,我就不会这样做。这不是跳过两行。通过从原始代码中删除“line2=f.read()”,我获得了类似于['20120928'、'20121128'、'0'、'0'、'0'、'0'、'0'、'0'、'0'、'0'、'0'以及数千个以上的内容]。看起来“read.txt”文件中的每一行都是0,这不是我想要的。。。