Python 打印文件中的一段文本
我仍在学习python,我有一个文件示例:Python 打印文件中的一段文本,python,Python,我仍在学习python,我有一个文件示例: RDKit 3D 0 0 0 0 0 0 0 0 0 0999 V3000 M V30 BEGIN CTAB M V30 COUNTS 552 600 0 0 0 M V30 BEGIN ATOM M V30 1 C 7.3071 41.3785 19.7482 0 M V30 2 C 7.5456 41.3920 21.2703 0 M V30 3 C 8.3653 40.1559 21.687
RDKit 3D
0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 552 600 0 0 0
M V30 BEGIN ATOM
M V30 1 C 7.3071 41.3785 19.7482 0
M V30 2 C 7.5456 41.3920 21.2703 0
M V30 3 C 8.3653 40.1559 21.6876 0
M V30 4 C 9.7001 40.0714 20.9228 0
M V30 5 C 9.4398 40.0712 19.4042 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 0 1 1 2
M V30 1 1 1 6
M V30 2 1 1 10
M V30 3 1 1 11
M V30 4 1 2 3
M V30 END BOND
M V30 END CTAB
M END
我只想打印以下部分之间的信息:
M V30 BEGIN ATOM
以及:
由于不同文件的原子数不同,我希望可以使用一种通用方法。有人能帮忙吗
非常感谢。试试这个:
with open('filename.txt','r') as f:
ok_to_print = False
for line in f.readlines():
line = line.strip # remove whitespaces
if line == 'M V30 BEGIN BOND':
ok_to_print = True
elif line == 'M V30 END ATOM':
ok_to_print = False
else:
if ok_to_print:
print(line)
# Read file contents
with open("file.txt") as file:
inside = False
for line in file:
# Start section of interest
if line.rstrip() == "M V30 BEGIN ATOM":
inside = True
# End section of interest
elif line.rstrip() == "M V30 END ATOM":
inside = False
# Inside section of interest
elif inside:
print(line.rstrip())
else:
pass
这将在您读取文件时逐行处理它。对于无法全部存储在内存中的大文件,这是理想的选择。对于小文件,您可以将整个内容读入内存并使用正则表达式
import re
data = ''
with open('filename.txt','r') as f:
data = f.read()
a = re.compile('M V30 BEGIN BOND(.+?)M V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
print(result)
注意:此代码都没有经过测试。只是瞎写而已。试试这个:
with open('filename.txt','r') as f:
ok_to_print = False
for line in f.readlines():
line = line.strip # remove whitespaces
if line == 'M V30 BEGIN BOND':
ok_to_print = True
elif line == 'M V30 END ATOM':
ok_to_print = False
else:
if ok_to_print:
print(line)
# Read file contents
with open("file.txt") as file:
inside = False
for line in file:
# Start section of interest
if line.rstrip() == "M V30 BEGIN ATOM":
inside = True
# End section of interest
elif line.rstrip() == "M V30 END ATOM":
inside = False
# Inside section of interest
elif inside:
print(line.rstrip())
else:
pass
这将在您读取文件时逐行处理它。对于无法全部存储在内存中的大文件,这是理想的选择。对于小文件,您可以将整个内容读入内存并使用正则表达式
import re
data = ''
with open('filename.txt','r') as f:
data = f.read()
a = re.compile('M V30 BEGIN BOND(.+?)M V30 END ATOM',re.I|re.M|re.DOTALL)
results = a.findall(data)
for result in results:
print(result)
注意:此代码都没有经过测试。只需盲写即可。您可以尝试以下方法:
# Read file contents
with open("file.txt") as file:
inside = False
for line in file:
# Start section of interest
if line.rstrip() == "M V30 BEGIN ATOM":
inside = True
# End section of interest
elif line.rstrip() == "M V30 END ATOM":
inside = False
# Inside section of interest
elif inside:
print(line.rstrip())
else:
pass
您可以尝试以下方法:
# Read file contents
with open("file.txt") as file:
inside = False
for line in file:
# Start section of interest
if line.rstrip() == "M V30 BEGIN ATOM":
inside = True
# End section of interest
elif line.rstrip() == "M V30 END ATOM":
inside = False
# Inside section of interest
elif inside:
print(line.rstrip())
else:
pass
这就是我将如何做到这一点(与csv)
这就是我将如何做到这一点(与csv)
考虑到试图保持逻辑的简短和甜蜜分离,以及您想要一种可移植方法的事实:
def print_atoms_from_file(full_file_path):
with open(full_file_path, 'r') as f:
start_printing = False
for line in f:
if 'BEGIN ATOM' in line:
start_printing = True
continue
if 'END ATOM' in line:
start_printing = False
continue
if start_printing:
print line
print_atoms_from_file('test_file_name.txt')
考虑到试图保持逻辑的简短和甜蜜分离,以及您想要一种可移植方法的事实:
def print_atoms_from_file(full_file_path):
with open(full_file_path, 'r') as f:
start_printing = False
for line in f:
if 'BEGIN ATOM' in line:
start_printing = True
continue
if 'END ATOM' in line:
start_printing = False
continue
if start_printing:
print line
print_atoms_from_file('test_file_name.txt')
您可以尝试以下功能:
def extract_lines(filename, start_line, stop_line):
lines=[]
with open(filename,'r') as f:
lines=f.readlines()
list_of_lines=[line.rstrip('\n') for line in lines]
start_point=list_of_lines.index(start_line)
stop_point=list_of_lines.index(stop_line)
return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))
您可以尝试以下功能:
def extract_lines(filename, start_line, stop_line):
lines=[]
with open(filename,'r') as f:
lines=f.readlines()
list_of_lines=[line.rstrip('\n') for line in lines]
start_point=list_of_lines.index(start_line)
stop_point=list_of_lines.index(stop_line)
return "\n".join(list_of_lines[i] for i in range(start_point+1,stop_point))
到目前为止,您尝试了什么?可能使用模块可以使其genericStart读取文件->在列表中找到起始字符串时开始捕获数据/新字符串->在找到要停止的字符串时停止。到目前为止,您尝试了什么?可能使用模块可以使其genericStart读取文件->在当您找到要停止的字符串时,您可以在列表/new string->stop中找到起始字符串。这取决于第二个字段是否始终为V30,但模型通常很好,正在寻找模式。也许当一个包含“BEGIN ATOM”时开始捕获行,当一个包含“END ATOM”时停止捕获行?好的,是的,可以将其更改为行。endswith(‘BEGIN BOND’)也将进行向上投票,但后来我了解到您使用正则表达式。这将取决于第二个字段是否始终是V30,但是模型通常是好的,寻找一种模式。也许当一个包含“BEGIN ATOM”时开始捕获行,当一个包含“END ATOM”时停止捕获行?好的,是的,可以将其更改为行。endswith(‘BEGIN BOND’)也将进行向上投票,但后来我了解到您使用正则表达式。因此,不需要向上投票。@Wychh太好了!你能把这个问题标为已回答吗?:)@太好了!你能把这个问题标为已回答吗?:)