Python f、 readline与f.read打印输出_Python_Parsing_Readfile

Python f、 readline与f.read打印输出

python parsing

Python f、 readline与f.read打印输出,python,parsing,readfile,Python,Parsing,Readfile,我是Python新手（使用Python 3.6）。我有一个read.txt文件，其中包含一家公司的信息。该文件以不同的报告特征开始 CONFORMED PERIOD REPORT: 20120928 #this is 1 line DATE OF REPORT: 20121128 #this is another line and then starts all the text about the firm..... #lot

我是Python新手（使用Python 3.6）。我有一个read.txt文件，其中包含一家公司的信息。该文件以不同的报告特征开始

CONFORMED PERIOD REPORT:             20120928 #this is 1 line
DATE OF REPORT:                      20121128 #this is another line

and then starts all the text about the firm..... #lots of lines here

我试图提取两个日期（['20120928'，'20121128']）以及文本中的一些字符串（即，如果字符串存在，则我需要一个'1'）。最后，我想要一个向量，它给我两个日期+不同字符串的1和0，也就是说，类似：['20120928'，'20121128'，'1'，'0']。我的代码如下：

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = f.read()  # read the txt file
    for line in f:
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = "" # create an empty string variable out of the "for line" loop
    for line in f:
        line2 = line2 + line #append each line to the above created empty string
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

如果我运行这段代码，我将获得['1'，'0'，省略日期并正确读取文件，var1存在（确定'1'），var2不存在（确定'0'）。我不明白的是为什么它不报告日期。重要的是，当我将line2更改为“line2=f.readline（）”时，我将获得['20120928'、'20121128'、'0'、'0']。现在日期还可以，但我知道var1存在，它似乎不读取文件的其余部分？如果我省略“line2=f.read（）”，它会为每一行输出一个0的向量，除了我想要的输出。我怎么能省略这些0

我期望的输出是：['20120928'，'20121128'，'1'，'0']

对不起打扰了。无论如何谢谢你

line2=f.read（）

将整个文件读入

line2

，因此对于f:循环中的行，您的

没有什么可读的了。
行f.read（）
将整个文件读入变量line2
。如果你想逐行阅读，你可以跳过f.read（）
，然后像这样迭代
with open('read.txt', 'r') as f:
    for line in f:

否则，在您将.read（）
读入line2
之后，就没有更多的文本可以从f
中读取了，因为它都包含在line2
变量中。
我最终采用了以下方式：
exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = f.read()  # read the txt file
    for line in f:
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = "" # create an empty string variable out of the "for line" loop
    for line in f:
        line2 = line2 + line #append each line to the above created empty string
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

到目前为止，这就是我得到的。这对我来说很有效，不过我想使用beautifulsoup可以提高代码的效率。下一步：）
最好使用f.readlines（）然后使用for line-in-line，而不是按分割\n，因为这可能不会给您预期的结果。我不确定第一个代码片段是否值得作为建议提及，第二个方法显然是感谢大家的方法。我完全理解你的意思。但是，当我从原始代码中省略“line2=f..read（）”时，我在“豁免”向量中获得了大量0。我用这些信息编辑了我的原始帖子。有什么建议吗？如果你想在for循环之前“跳过”两行，你可以在阅读其余内容之前添加两次f.readline（）
，以“弹出”两行。如果我在跟随，我就不会这样做。这不是跳过两行。通过从原始代码中删除“line2=f.read（）”，我获得了类似于['20120928'、'20121128'、'0'、'0'、'0'、'0'、'0'、'0'、'0'、'0'、'0'以及数千个以上的内容]。看起来“read.txt”文件中的每一行都是0，这不是我想要的。。。