Python 在两段文本之间提取文本
我正在尝试使用Python在以下标题之间提取文本:Python 在两段文本之间提取文本,python,regex,python-3.x,text-extraction,Python,Regex,Python 3.x,Text Extraction,我正在尝试使用Python在以下标题之间提取文本: @HEADER1 ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe ExtractMe @othertext @HEADER1+@othertext的确切文本可能会随时间变化。所以我需要有活力 另外,HEADER2是一个以'@'开头的单词。那么,我是否可以使用startswith功能?还是正则表达式 差不多 For line in f
@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER1
+@othertext
的确切文本可能会随时间变化。所以我需要有活力
另外,HEADER2
是一个以'@'
开头的单词。那么,我是否可以使用startswith
功能?还是正则表达式
差不多
For line in file:
if(line == 'HEADER1'):
print next line
continue = TRUE
if(continue == TRUE):
print(line)
elif(line == othertext):
break
无需再
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
您可以在字符串拼接中使用str.find
。像这样:
print(string[string.find("\n"):string.find("\n@")])
或者你可以把字符串变成一个列表,得到你想要的元素,然后像这样把它连接起来
list = string.split("\n")
list = list[1:len(list)-1]
print("\n".join(list))
这就行了
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext"""
print '"{}"'.format(re.split(r'(@HEADER1[\n\r]|[\n\r]@othertext)', string)[2])
输出:
"ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe"
看起来像这样
import re
string = """@HEADER1
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
@othertext
@HEADER2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
@othertext"""
for a in re.findall(r'@\w+(?:\r\n|\r|\n)(.*?)@\w+(?:\r\n|\r|\n)?', string, re.DOTALL):
print a
输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
我在这种情况下使用partition()方法 输出:
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe2
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
ExtractMe
在你的内容
行中没有HEADER2
。startswith(“@”)
可以找到以@do not do开头的单词,如果line=='@HEADER1',do如果line.startswith('@HEADER1')
。使用第一个选项时,您将忘记换行符。不要使用python内置的continue
。另外,的用小写字母“f”拼写,您不应该考虑@
等吗?如果换行符是\r\n
,这是否有效?