Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在两段文本之间提取文本_Python_Regex_Python 3.x_Text Extraction - Fatal编程技术网

Python 在两段文本之间提取文本

Python 在两段文本之间提取文本,python,regex,python-3.x,text-extraction,Python,Regex,Python 3.x,Text Extraction,我正在尝试使用Python在以下标题之间提取文本: @HEADER1 ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe @othertext @HEADER1+@othertext的确切文本可能会随时间变化。所以我需要有活力 另外,HEADER2是一个以'@'开头的单词。那么,我是否可以使用startswith功能?还是正则表达式 差不多 For line in f

我正在尝试使用Python在以下标题之间提取文本:

@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER1
+
@othertext
的确切文本可能会随时间变化。所以我需要有活力

另外,
HEADER2
是一个以
'@'
开头的单词。那么,我是否可以使用
startswith
功能?还是正则表达式

差不多

For line in file:
    if(line == 'HEADER1'):
        print next line
        continue = TRUE
    if(continue == TRUE):
        print(line)
    elif(line == othertext):
        break
无需再

string = """@HEADER1
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    ExtractMe
    @othertext"""
您可以在字符串拼接中使用
str.find
。像这样:

print(string[string.find("\n"):string.find("\n@")])
或者你可以把字符串变成一个列表,得到你想要的元素,然后像这样把它连接起来

list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))
这就行了

import re

string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""

print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])
输出:

"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"

看起来像这样

import re

string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""

for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
    print a
输出:

ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe

ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe
ExtractMe  
ExtractMe  
ExtractMe  

我在这种情况下使用partition()方法

输出:

ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe

ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe  
ExtractMe
ExtractMe  
ExtractMe  
ExtractMe  

在你的内容
行中没有
HEADER2
。startswith(“@”)
可以找到以@do not do开头的单词,如果line=='@HEADER1',do
如果line.startswith('@HEADER1')
。使用第一个选项时,您将忘记换行符。不要使用python内置的
continue
。另外,的
用小写字母“f”拼写,您不应该考虑
@
等吗?如果换行符是
\r\n
,这是否有效?