Python 2.7 如何在两个标题之间的行中提取信息?
我是python新手,正在尝试使用这段目前无法正常工作的代码从文本文件中提取两个标题之间的信息Python 2.7 如何在两个标题之间的行中提取信息?,python-2.7,Python 2.7,我是python新手,正在尝试使用这段目前无法正常工作的代码从文本文件中提取两个标题之间的信息 with open('toysystem.txt','r') as f: start = '<Keywords>' end = '</Keywords>' i = 0 lines = f.readlines() for line in lines: if line == start: keywords = lines[i+1] i += 1
with open('toysystem.txt','r') as f:
start = '<Keywords>'
end = '</Keywords>'
i = 0
lines = f.readlines()
for line in lines:
if line == start:
keywords = lines[i+1]
i += 1
作为参考,文本文件如下所示:
<Keywords>
GTO
</Keywords>
有没有关于代码可能有什么问题的想法?或者用另一种方式来解决这个问题
谢谢大家!
从文件中读取的行在末尾包含换行符,因此我们可能应该将它们
f对象是一个,所以这里不需要使用str.readlines方法
所以我们可以写这样的东西
with open('toysystem.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
for line in f:
if line.rstrip() == start:
break
for line in f:
if line.rstrip() == end:
break
keywords.append(line)
如果您不需要在关键字末尾加换行符,请将它们也去掉:
with open('toysystem.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
for line in f:
if line.rstrip() == start:
break
for line in f:
if line.rstrip() == end:
break
keywords.append(line.rstrip())
但在这种情况下,最好创建剥离线,如
给我们
>>> keywords
['GTO\n']
>>> lines
['<Keywords>\n', 'GTO\n', '</Keywords>\n']
>>> keywords
['GTO']
进一步阅读
,
包括文件迭代器,
,
使用Python重新安装模块并使用正则表达式解析它
import re
with open('toysystem.txt','r') as f:
contents = f.read()
# will find all the expressions in the file and return a list of values inside the (). You can extend the expression according to your need.
keywords = re.findall(r'\<keywords\>\s*\n*\s*(.*?)\s*\n*\s*\<\/keywords\>')
print(keywords)
有关正则表达式和python检查的更多信息
,及
with open('test.txt', 'r') as f:
start = '<Keywords>'
end = '</Keywords>'
keywords = []
lines = f.readlines()
stripped_lines = (line.rstrip() for line in lines)
for line in stripped_lines:
if line.rstrip() == start:
break
for line in stripped_lines:
if line.rstrip() == end:
break
keywords.append(line.rstrip())
>>> lines
['<Keywords>\n', 'GTO\n', '</Keywords>\n']
>>> keywords
['GTO']
import re
with open('toysystem.txt','r') as f:
contents = f.read()
# will find all the expressions in the file and return a list of values inside the (). You can extend the expression according to your need.
keywords = re.findall(r'\<keywords\>\s*\n*\s*(.*?)\s*\n*\s*\<\/keywords\>')
print(keywords)
['GTO']