用Python解析电子邮件
我正在编写一个Python脚本来处理从中返回的电子邮件。正如本文所建议的,我正在使用以下Procmail配置:用Python解析电子邮件,python,email,parsing,mime,Python,Email,Parsing,Mime,我正在编写一个Python脚本来处理从中返回的电子邮件。正如本文所建议的,我正在使用以下Procmail配置: :0: |$HOME/process_mail.py My process_mail.py脚本正在通过stdin接收电子邮件,如下所示: From hostname Tue Jun 15 21:43:30 2010 Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400 Received: fro
:0:
|$HOME/process_mail.py
My process_mail.py脚本正在通过stdin接收电子邮件,如下所示:
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
我想获得诸如“发件人”、“收件人”和“主题”之类的消息字段。但是,消息对象不包含任何这些字段
我做错了什么?我回答自己
我在构建消息的代码中发现了一个bug。它在某些行之间附加换行符,使解析器无法正常工作。看起来您的换行符在附加行之前没有空格,根据以下内容,这是非法的: 每个标题字段在逻辑上是一行字符,包含
字段名、冒号和字段正文。为方便起见
但是,为了处理每行998/78个字符的限制,
标题字段的字段正文部分可以拆分为多个
线表示法;这就是所谓的“折叠”。一般规则是
本标准允许折叠空白的地方(非
仅限WSP字符),可以在任何WSP之前插入CRLF。对于
例如,标题字段:
Subject: This is a test
可表示为:
Subject: This
is a test
它应该是这样的:
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
来自主机名2010年6月15日星期二21:43:30
接收:(从网络调用qmail 8580);2010年6月15日21:43:22-0400
收到:来自mail-fx0-f44.google.com(209.85.161.44)
通过ip-73-187-35-131.ip.secureserver.net和SMTP;2010年6月15日21:43:22-0400
接收:由SMTP id为19so170709fxm的fxm19接收。3
对于2010年6月15日星期二18:47:33-0700(PDT)
MIME版本:1.0
接收:由10.103.84.1接收,SMTP id m1mr2774225mul.26.1276652853684;星期二,15
2010年6月18:47:33-0700(PDT)
收到:通过10.123.143.4与HTTP;2010年6月15日星期二18:47:33-0700(PDT)
日期:2010年6月15日星期二20:47:33-0500
消息ID:
主题:测试12
发件人:全名
致:username@domain.com
内容类型:文本/纯文本;字符集=ISO-8859-1
一个
两个
三
您必须确保行不会意外断开(如上所述,尽管很难说这是否是复制粘贴问题)--使用完整的消息,例如:
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
根据需要打印
TEST 12
。如何在此处获取电子邮件正文?如果您确实希望整个RFC2822电子邮件正文具有原始MIME结构和所有内容,用Python解析消息基本上是多余的;身体是一切之后的第一条空线。通常,对于现代消息,您需要解析MIME结构并提取一个或多个身体部位。因此,为了澄清这一点,如果原始文件说Subject:This\r\n是一个测试
,那么email.message\u from_string()
应该说Subject是这是一个测试
(没有空格)。我发现,对于一封带有附件名称包装的特定电子邮件(内容处置
),有趣的\r\n
没有被剥离。