Python-格式化输出
对于以下二进制文件(可从下载): 我有以下Python代码:Python-格式化输出,python,regex,format,output,Python,Regex,Format,Output,对于以下二进制文件(可从下载): 我有以下Python代码: import re terms = {} numbers = {} meshFile = 'd2017.bin' with open(meshFile, mode='rb') as file: mesh = file.readlines() outputFile = open('mesh.txt', 'w') for line in mesh: meshTerm = re.search(b'MH = (.+)$
import re
terms = {}
numbers = {}
meshFile = 'd2017.bin'
with open(meshFile, mode='rb') as file:
mesh = file.readlines()
outputFile = open('mesh.txt', 'w')
for line in mesh:
meshTerm = re.search(b'MH = (.+)$', line)
if meshTerm:
term = meshTerm.group(1)
meshNumber = re.search(b'MN = (.+)$', line)
if meshNumber:
number = meshNumber.group(1)
numbers[str(number)] = term
if term in terms:
terms[term] = terms[term] + ' ' + str(number)
else:
terms[term] = str(number)
cumlist = []
keylist = terms.keys()
for key in keylist:
#print('THE ORIGIN FOR ', key, file=outputFile)
item_list = terms[key].split(" ")
for phrase in item_list:
cumlist.append(phrase)
print(cumlist)
for item in cumlist:
print(numbers[str(item)], '\n', item, file=outputFile)
输出如下所示:
b'Calcimycin\r'
b'D03.633.100.221.173\r'
b'Temefos\r'
b'D02.705.400.625.800\r'
b'Temefos\r'
b'D02.705.539.345.800\r'
b'Temefos\r'
b'D02.886.300.692.800\r'
Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
如何将输出重新格式化为如下所示:
b'Calcimycin\r'
b'D03.633.100.221.173\r'
b'Temefos\r'
b'D02.705.400.625.800\r'
b'Temefos\r'
b'D02.705.539.345.800\r'
b'Temefos\r'
b'D02.886.300.692.800\r'
Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
谢谢
UPDATE: I simplified the source a bit
您可以尝试以下正则表达式:
MH\s*=\s*(\w+)\s*|MN\s*= \s*([^\s]*)
示例代码:()
样本输出:
Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
你只使用二进制字符串有什么原因吗?str.decode('utf-8').strip()@TidB如果你在这里引用正则表达式并使用“b”而不是“r”,这是因为我正在读取一个二进制文件,它是一个网格文件。当我使用“r”时,正则表达式不起作用。我回答了你的问题吗?@简单上面的代码只需一个正则表达式就可以提供你想要的一切。。。您可以根据输出决定如何处理它们。。更新了一点。。你现在不能测试,它更格式化