在文件中查找字符串并复制，直到Python中出现特定字符_Python_String_File

在文件中查找字符串并复制，直到Python中出现特定字符

python string file

在文件中查找字符串并复制，直到Python中出现特定字符,python,string,file,Python,String,File,我有多个.txt文件，其中的信息在转换后如下所示： > ** ** **|** ** STYLE # ** **|** ** DESCR : Potrero415;Tbl- Rnd ** **\--------** ** ** **\--** **ZONE 1** **\--** ** ** **\--** **ZONE 2** **\--** ** ** **\-

我有多个.txt文件，其中的信息在转换后如下所示：

    >  **   ** **|** **     STYLE #        ** **|** **   DESCR  :  Potrero415;Tbl-
Rnd                 ** **\--------** **         ** **\--** **ZONE  1** **\--**
**           ** **\--** **ZONE  2** **\--** **      ** **\----** **      -T1-
-T2-  -T3-


                **

我想抓取从

DESCR:

到下一行开始的所有内容

*****--*****ZONE 2**

所以我的字符串应该如下所示：

DESCR:Potrero415；Tbl Rnd

请注意，在此特定部分之前的文件中有多行文本，单词

DESCR

仅出现在我要从中复制的位置，之前没有其他出现

我知道拆分可以一直使用到出现

***\

所有文件的格式都相同，只需从

DESCR:

到

***

我知道我正冒着在这篇文章上获得否决票的风险。更新：我通过以下方法找到了该词的外观：

lines = test.readlines()
test.close()
for line in lines:
    line = line.strip()
    if line.find("DESCR") != -1:
        print("FOUND")

其中

test

是我打开的文件

听起来像是正则表达式的作业

是文件的内容

>>> import re
>>> s = '''    >  **   ** **|** **     STYLE #        ** **|** **   DESCR  :  Potrero415;Tbl-
... Rnd                 ** **\--------** **         ** **\--** **ZONE  1** **\--**
... **           ** **\--** **ZONE  2** **\--** **      ** **\----** **      -T1-
... -T2-  -T3-
... 
... 
...                 ** '''
>>> 
>>> re.search('(DESCR\s*:.*?)\s*\*\* \*\*', s, re.DOTALL).group(1)
'DESCR  :  Potrero415;Tbl-\nRnd'

（在正则表达式前面加上（？s）与提供

re.DOTALL

参数具有相同的效果。）

您可以使用正则表达式

import re

match = re.search('(?=DESCR).*?(?=\*\*)', your_txt)
print(match.group(0))

将输出：

描述：Potrero415；Tbl Rnd

其中：

Positive Lookahead (?=DESCR)
Assert that the Regex below matches
DESCR matches the characters DESCR literally (case sensitive)
.*? matches any character 
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=\*\*)
Assert that the Regex below matches
\* matches the character * literally (case sensitive)
\* matches the character * literally (case sensitive)
Global pattern flags
s modifier: single line. Dot matches newline characters

到目前为止，您尝试了什么？现在更新：）“转换后”，什么样的转换？使用HTML2TEXT从.htm转换为.txt您提到可以使用split，为什么不使用它？谢谢！祝您度过愉快的一天：）非常清楚的解释，谢谢您的时间，先生！