Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/email/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用Python解析电子邮件_Python_Email_Parsing_Mime - Fatal编程技术网

用Python解析电子邮件

用Python解析电子邮件,python,email,parsing,mime,Python,Email,Parsing,Mime,我正在编写一个Python脚本来处理从中返回的电子邮件。正如本文所建议的,我正在使用以下Procmail配置: :0: |$HOME/process_mail.py My process_mail.py脚本正在通过stdin接收电子邮件,如下所示: From hostname Tue Jun 15 21:43:30 2010 Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400 Received: fro

我正在编写一个Python脚本来处理从中返回的电子邮件。正如本文所建议的,我正在使用以下Procmail配置:

:0:
|$HOME/process_mail.py
My process_mail.py脚本正在通过stdin接收电子邮件,如下所示:

From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
    by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
    for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
    Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE
我想获得诸如“发件人”、“收件人”和“主题”之类的消息字段。但是,消息对象不包含任何这些字段

我做错了什么?

我回答自己


我在构建消息的代码中发现了一个bug。它在某些行之间附加换行符,使解析器无法正常工作。

看起来您的换行符在附加行之前没有空格,根据以下内容,这是非法的:

每个标题字段在逻辑上是一行字符,包含
字段名、冒号和字段正文。为方便起见
但是,为了处理每行998/78个字符的限制,
标题字段的字段正文部分可以拆分为多个
线表示法;这就是所谓的“折叠”。一般规则是
本标准允许折叠空白的地方(非
仅限WSP字符),可以在任何WSP之前插入CRLF。对于
例如,标题字段:

    Subject: This is a test
可表示为:

    Subject: This
     is a test
它应该是这样的:

From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
    by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
    for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
    Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE
来自主机名2010年6月15日星期二21:43:30
接收:(从网络调用qmail 8580);2010年6月15日21:43:22-0400
收到:来自mail-fx0-f44.google.com(209.85.161.44)
通过ip-73-187-35-131.ip.secureserver.net和SMTP;2010年6月15日21:43:22-0400
接收:由SMTP id为19so170709fxm的fxm19接收。3
对于2010年6月15日星期二18:47:33-0700(PDT)
MIME版本:1.0
接收:由10.103.84.1接收,SMTP id m1mr2774225mul.26.1276652853684;星期二,15
2010年6月18:47:33-0700(PDT)
收到:通过10.123.143.4与HTTP;2010年6月15日星期二18:47:33-0700(PDT)
日期:2010年6月15日星期二20:47:33-0500
消息ID:
主题:测试12
发件人:全名
致:username@domain.com
内容类型:文本/纯文本;字符集=ISO-8859-1
一个
两个
三

您必须确保行不会意外断开(如上所述,尽管很难说这是否是复制粘贴问题)--使用完整的消息,例如:

Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE

根据需要打印
TEST 12

如何在此处获取电子邮件正文?如果您确实希望整个RFC2822电子邮件正文具有原始MIME结构和所有内容,用Python解析消息基本上是多余的;身体是一切之后的第一条空线。通常,对于现代消息,您需要解析MIME结构并提取一个或多个身体部位。因此,为了澄清这一点,如果原始文件说
Subject:This\r\n是一个测试
,那么
email.message\u from_string()
应该说Subject是
这是一个测试
(没有空格)。我发现,对于一封带有附件名称包装的特定电子邮件(
内容处置
),有趣的
\r\n
没有被剥离。