Python 将编码字符串转换为正常的可打印字符

Python 将编码字符串转换为正常的可打印字符,python,Python,我试图从MBOX文件中提取细节,并创建了以下示例程序 这是可行的,但有些标题会打印编码字符串,例如 =?UTF-8?B?QVJNIE1hY3MgYXJlIGNvbWluZywgdGhyZWUgeWVhcnMgYWZ0ZXIgQXBwbGU=?= =?UTF-8?B?4oCZcyBhdHRpdHVkZSBjaGFuZ2U=?= 我推测“=?UTF-8?B?”表示Base64编码,所以我猜从Base64转换到UTF-8必须有两个步骤 有人能告诉我一个方法来将这些字符串转换成普通的可打印字符吗

我试图从MBOX文件中提取细节,并创建了以下示例程序

这是可行的,但有些标题会打印编码字符串,例如

 =?UTF-8?B?QVJNIE1hY3MgYXJlIGNvbWluZywgdGhyZWUgeWVhcnMgYWZ0ZXIgQXBwbGU=?=
 =?UTF-8?B?4oCZcyBhdHRpdHVkZSBjaGFuZ2U=?=
我推测“=?UTF-8?B?”表示Base64编码,所以我猜从Base64转换到UTF-8必须有两个步骤

有人能告诉我一个方法来将这些字符串转换成普通的可打印字符吗

#! /usr/bin/env python3
#import locale
#2020-02-27

"""
Extract Subject from MBOX file
"""

import os, time
import mailbox
from email.header import Header

for message in mailbox.mbox('~/temp/Inbox'):
    subject = message['subject']
    sender = message['from']
    ddate = message['Delivery-date'].
    print(subject, sender)
我已经取得了一些进展——如果我把衣服脱掉的话

=?UTF-8?B?

?=  
然后调用
base64.b64decode()
我会得到可读的文本

上面的字符串变为b'\xe2\x80\x99s姿态变化'

=?UTF-8?B?QVJNIE1hY3MgYXJlIGNvbWluZywgdGhyZWUgeWVhcnMgYWZ0ZXIgQXBwbGU=?=
变为b“ARM Mac即将问世,距离苹果三年之后”

将这些连接在一起就可以得到主题

苹果态度改变三年后,ARM Mac即将问世

这行吗

#! /usr/bin/env python3
"""
Extract Subject from MBOX file
"""

import os, time
import mailbox
from email.header import Header

for message in mailbox.mbox('~/temp/Inbox'):
    subject = message['subject']
    sender = message['from']
    ddate = message['Delivery-date'].
    print(subject.decode('utf-8', 'ignore'), sender.decode('utf-8', 'ignore'))

我编写了一个函数来转换UTF-8 Base64或带引号的可打印字符串,尽管我很惊讶我找不到现有的方法

#! /usr/bin/env python3
#import locale
#2020-02-27

"""
Extract Subject from MBOX file
"""

import os, time
import mailbox
import base64, quopri

def bdecode(s):
    """
    Convert UTF-8 Base64 or Quoted Printable strings to str
    """
    outstr = ""
    if s is None:
        return outstr
    for ss in s.splitlines():   # split multiline strings
        sss = ss.strip()
        for sssp in sss.split(' '):   # split multiple strings
            if sssp.upper().startswith('=?UTF-8?B?'):
                bbb = base64.b64decode(sssp[10:-2])
                outstr+=bbb.decode("utf-8")
            elif sssp.upper().startswith('=?UTF-8?Q?'):
                bbb = quopri.decodestring(sssp[10:-2])
                outstr+=bbb.decode("utf-8")
            else:
                outstr+=sssp
    return outstr

for message in mailbox.mbox('~/temp/Inbox'):
    subject = message['subject']
    print(bdecode(subject))

对字符串变量调用
str()
是否可以解决您的问题?请尝试使用
subject.decode('utf-8')
AttributeError:'str'对象没有属性'decode'@Milliways当对象已经是字符串而不是编码对象时会发生这种情况。例如,检查答案中的在线链接。