在Python中分割具有单独位置标记的文件的更好方法
我有以下类型的文件:在Python中分割具有单独位置标记的文件的更好方法,python,Python,我有以下类型的文件: --- part0 --- some strings --- part1 --- some other strings --- part2 --- ... 我希望以python列表的形式获取文件的任何部分: x = get_part_of_file(part=0) print x # => should print ['some', 'strings'] x = get_part_of_file(part=1) print x # => should prin
--- part0 ---
some
strings
--- part1 ---
some other
strings
--- part2 ---
...
我希望以python列表的形式获取文件的任何部分:
x = get_part_of_file(part=0)
print x # => should print ['some', 'strings']
x = get_part_of_file(part=1)
print x # => should print ['some other', 'strings']
所以,我的问题是实现上面使用的get\u part\u文件
方法的最简单方法
我的(丑陋的)解决方案如下:
def get_part_of_file(part, separate_str="part"):
def does_match_to_separate(line):
return re.compile("{}.*{}".format(separate_str, part)).match(line)
def get_first_line_num_appearing_separate_str(lines):
return len(list(end_of_loop() if does_match_to_separate(line, part) else line for line in lines))
with open("my_file.txt") as f:
lines = f.readlines()
# get first line number of the required part
first_line_num = get_first_line_num_appearing_separate_str(part)
# get last line number of the required part
last_line_num = get_first_line_num_appearing_separate_str(part + 1) - 1
return lines[first_line_num:last_line_num]
您可以使用正则表达式来解析字符串。请看下面的示例,并在以下设备上试用: 您可能遇到的唯一问题是,目前正则表达式模式只包含字符、空格和换行符
\w\s
。如果零件的值中有其他字符,则必须扩展此模式以匹配更多字符。使用可以编写如下内容
>>> input_file = open('input', 'r')
>>> content = input_file.read()
>>> content_parts = re.split('.+?part\d+.+?\n', content)
>>> content_parts
['', 'some\nstrings\n', 'some other\nstrings\n', '']
>>> [ part.split('\n') for part in content_parts if part ]
[['some', 'strings', ''], ['some other', 'strings', '']]
import re
parts = re.finditer(your_regex_pattern, text)
for p in parts:
print("Part %s: %s" % (p.group('part_number'), p.group('part_value'))
# or return the element with the part-number you want.
>>> input_file = open('input', 'r')
>>> content = input_file.read()
>>> content_parts = re.split('.+?part\d+.+?\n', content)
>>> content_parts
['', 'some\nstrings\n', 'some other\nstrings\n', '']
>>> [ part.split('\n') for part in content_parts if part ]
[['some', 'strings', ''], ['some other', 'strings', '']]