如何使用Python在两个字符串之间提取文本？_Python_String_Parsing_Header_Extract

如何使用Python在两个字符串之间提取文本？

python string parsing

如何使用Python在两个字符串之间提取文本？,python,string,parsing,header,extract,Python,String,Parsing,Header,Extract,我想使用Python从temp.txt文件中提取由标题定义的文本块 temp.txt如下所示，其中header1（年）和header2（月）由分隔符'tab=/t'分隔： header1="2016"/theader2="Jan" Lion Animal Apple Food .end header1="2016"/theader2="Feb" Tiger Animal Orange Food .end 我已经编写了一个脚本，如下所示，它可以很好地工作（cmd:python script.p

我想使用Python从temp.txt文件中提取由标题定义的文本块

temp.txt如下所示，其中header1（年）和header2（月）由分隔符'tab=/t'分隔：

header1="2016"/theader2="Jan"
Lion Animal
Apple Food
.end

header1="2016"/theader2="Feb"
Tiger Animal
Orange Food
.end

我已经编写了一个脚本，如下所示，它可以很好地工作（cmd:python script.py[year][month]和argvs），但是这允许我只提取指定（月份、年份）的数据，并且有一个通配符月份（或年份）的限制来提取所有文本。（例如，如果我尝试使用python script.py[year]*通配符month，它将不起作用。）有更好的方法吗

import pandas as pd
import re
import sys

year = sys.argv[1]
month =sys.argv[2]

with open('./temp.txt') as infile, open('./output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == 'header1="%s"\theader2="%s"' % (year,month):
            copy = True
        elif line.strip() == '.end':
            copy = False
        elif copy:
            outfile.write(line)

pd.read_csv('./output', encoding='utf8', sep='\;', dtype='unicode').to_excel('./output.xlsx', sheet_name='sheet2', index=False)

您可以向脚本中添加通配符：

if ((year == '*' or ('header1="%s"' % year) in line.strip()) and
    (month == '*' or ('header2="%s"' % month) in line.strip())
    ):
    copy = True

从bash调用时，您需要转义或引用星号，以便它不会扩展为文件列表，例如：

python script.py [year] \*
python script.py [year] '*'

不过，您的程序的总体形状是正确的，您至少需要：

逐行迭代
跟踪您是否处于匹配块中
需要时写入输出文件

您的脚本基本上就是这样做的，所以我不必太担心优化它。

您可以在脚本中添加通配符：

if ((year == '*' or ('header1="%s"' % year) in line.strip()) and
    (month == '*' or ('header2="%s"' % month) in line.strip())
    ):
    copy = True

从bash调用时，您需要转义或引用星号，以便它不会扩展为文件列表，例如：

python script.py [year] \*
python script.py [year] '*'

不过，您的程序的总体形状是正确的，您至少需要：

逐行迭代
跟踪您是否处于匹配块中
需要时写入输出文件

你的脚本基本上就是这么做的，所以我不会太担心优化它