Python GMail API解码来自世界各地的消息_Python_Encoding_Gmail_Gmail Api_Quoted Printable

Python GMail API解码来自世界各地的消息

python encoding gmail

Python GMail API解码来自世界各地的消息,python,encoding,gmail,gmail-api,quoted-printable,Python,Encoding,Gmail,Gmail Api,Quoted Printable,我正在使用Python中的GMail API来检索用法语编写的邮件，实际上我的口音有问题我使用以下命令检索消息： message = service.users().messages().get(userId="me", id=i, format="raw").execute() base64.urlsafe_b64decode(message['raw'].encode('ASCII')) 我只想得到邮件的正文，所以我从以下内容开始： message = service.users(

我正在使用Python中的GMail API来检索用法语编写的邮件，实际上我的口音有问题

我使用以下命令检索消息：

 message = service.users().messages().get(userId="me", id=i, format="raw").execute()

base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

我只想得到邮件的正文，所以我从以下内容开始：

 message = service.users().messages().get(userId="me", id=i, format="raw").execute()

base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

对于某些邮件，我检索所有邮件数据，包括法语文本，如：

"Cette semaine, vous vous êtes servis du module de révision 0 fois"

对于其他一些人，我会引用打印编码，如下所示：

"Salut, =E7a farte?"

"Salut, =C3=A7a farte?"

引用的打印编码没有问题，因为我使用

quopri

模块构建了一个简单的解码功能。这里的主要问题是最后一句对于引用的打印编码是错误的，编码字符是

ç

，应该这样编码：

"Salut, =E7a farte?"

"Salut, =C3=A7a farte?"

因此，如果使用了错误的编码句子，我会得到这样的结果：

Salut, �a farte?

我怀疑来源是不同的邮件客户端，我的第一个例子是从Gmail客户端发送到Outlook地址的消息，第二个例子正好相反；将outlook邮件发送到Gmail地址

我的问题是，有没有一种方法可以处理任何可能的情况下的解码？

试试这个：

message = service.users().messages().get(userId='me', id=i).execute()
content = message['payload']['body']['data']
print(base64.b64decode(content).decode('utf-8'))

这将获得电子邮件的内容。

尝试以下操作：

message = service.users().messages().get(userId='me', id=i).execute()
content = message['payload']['body']['data']
print(base64.b64decode(content).decode('utf-8'))

这将获得电子邮件的内容。

问题在于，虽然

quopri

正确地将邮件正文从7位数据转换为8位数据，但随后用于将bytestring转换为unicode字符串的编码并不正确。在您的示例中，它似乎是ISO-8859-1：

[1]中的

：导入quopri
在[2]中：qoopri.decodestring（'sallt，=E7a farte？'））.decode（'iso-8859-1'））
出[2]：“敬礼，放个屁？”

通常，您应该能够使用

内容类型

标题获得正确的编码。这是使用引用的可打印UTF-8编码的邮件的外观：

Content-Type: text/plain;charset=UTF-8
Content-Transfer-Encoding: quoted-printable

问题在于，虽然

quopri

正确地将邮件正文从7位数据转换为8位数据，但随后用于将bytestring转换为unicode字符串的编码并不正确。在您的示例中，它似乎是ISO-8859-1：

[1]中的

：导入quopri
在[2]中：qoopri.decodestring（'sallt，=E7a farte？'））.decode（'iso-8859-1'））
出[2]：“敬礼，放个屁？”

通常，您应该能够使用

内容类型

标题获得正确的编码。这是使用引用的可打印UTF-8编码的邮件的外观：

Content-Type: text/plain;charset=UTF-8
Content-Transfer-Encoding: quoted-printable

我试过用这个。我使用content=message['payload']['parts'][0]['parts'][1]['body']['data']找到了电子邮件的html部分。但是，如果我尝试正确显示它，则会出现错误：“utf-8”编解码器无法解码位置16中的字节0x9c：无效的start byteI尝试使用此选项。我使用content=message['payload']['parts'][0]['parts'][1]['body']['data']找到了电子邮件的html部分。但是，如果我试图正确显示它，就会出现一个错误：“utf-8”编解码器无法解码位置16中的字节0x9c：无效的开始字节