Python-格式化输出

Python-格式化输出,python,regex,format,output,Python,Regex,Format,Output,对于以下二进制文件(可从下载): 我有以下Python代码: import re terms = {} numbers = {} meshFile = 'd2017.bin' with open(meshFile, mode='rb') as file: mesh = file.readlines() outputFile = open('mesh.txt', 'w') for line in mesh: meshTerm = re.search(b'MH = (.+)$

对于以下二进制文件(可从下载):

我有以下Python代码:

import re

terms = {}
numbers = {}

meshFile = 'd2017.bin'
with open(meshFile, mode='rb') as file:
    mesh = file.readlines()

outputFile = open('mesh.txt', 'w')

for line in mesh:
    meshTerm = re.search(b'MH = (.+)$', line)
    if meshTerm:
        term = meshTerm.group(1)
    meshNumber = re.search(b'MN = (.+)$', line)
    if meshNumber:
        number = meshNumber.group(1)
        numbers[str(number)] = term
        if term in terms:
            terms[term] = terms[term] + ' ' + str(number)
        else:
            terms[term] = str(number)

cumlist = []
keylist = terms.keys()
for key in keylist:
    #print('THE ORIGIN FOR ', key, file=outputFile)

    item_list = terms[key].split(" ")
    for phrase in item_list:
        cumlist.append(phrase)

print(cumlist)

for item in cumlist:
    print(numbers[str(item)], '\n', item, file=outputFile)
输出如下所示:

b'Calcimycin\r' 
 b'D03.633.100.221.173\r'
b'Temefos\r' 
 b'D02.705.400.625.800\r'
b'Temefos\r' 
 b'D02.705.539.345.800\r'
b'Temefos\r' 
 b'D02.886.300.692.800\r'
Calcimycin 
D03.633.100.221.173
Temefos 
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
如何将输出重新格式化为如下所示:

b'Calcimycin\r' 
 b'D03.633.100.221.173\r'
b'Temefos\r' 
 b'D02.705.400.625.800\r'
b'Temefos\r' 
 b'D02.705.539.345.800\r'
b'Temefos\r' 
 b'D02.886.300.692.800\r'
Calcimycin 
D03.633.100.221.173
Temefos 
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
谢谢

UPDATE: I simplified the source a bit
您可以尝试以下正则表达式:

MH\s*=\s*(\w+)\s*|MN\s*= \s*([^\s]*)

示例代码:()

样本输出:

Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800

你只使用二进制字符串有什么原因吗?str.decode('utf-8').strip()@TidB如果你在这里引用正则表达式并使用“b”而不是“r”,这是因为我正在读取一个二进制文件,它是一个网格文件。当我使用“r”时,正则表达式不起作用。我回答了你的问题吗?@简单上面的代码只需一个正则表达式就可以提供你想要的一切。。。您可以根据输出决定如何处理它们。。更新了一点。。你现在不能测试,它更格式化