pythonimap:检索MIMEMultipart';s邮件并使用HTML标记获取正文

pythonimap:检索MIMEMultipart';s邮件并使用HTML标记获取正文,python,html,email,tags,imap,Python,Html,Email,Tags,Imap,我的情况如下:我目前正在从收件箱中检索所有outlook(2016)邮件,更具体地说,我正在检索一个表: ¦ Product ¦ Currency ¦ Tenor (months) ¦ Code 1 ¦ ¦ MyItem ¦ USD ¦ 12 ¦ AAA01 ¦ 我的目标是捕获它们的主体,然后将它们存储在MsSQL服务器中 我很难理解“多部分””这个术语,现在有了几个小时(长时间)就更清楚了 所以现在我的过程是: 检查收件箱中的所有邮件 创建邮

我的情况如下:我目前正在从收件箱中检索所有outlook(2016)邮件,更具体地说,我正在检索一个表:

¦ Product ¦ Currency ¦ Tenor (months) ¦  Code 1 ¦   
¦ MyItem  ¦  USD     ¦   12           ¦ AAA01   ¦
我的目标是捕获它们的主体,然后将它们存储在MsSQL服务器中

我很难理解“多部分””这个术语,现在有了几个小时(长时间)就更清楚了

所以现在我的过程是:

  • 检查收件箱中的所有邮件
  • 创建邮件Id列表
  • 对于此列表中的所有Id,我正在检查邮件是否为
    Multipart
    one。
    • 如果Yes->我使用
      body=part.获取有效载荷(decode=True)
    • 如果No->我用
      body=b检索body。获取有效载荷(decode=True)
因此,在这两种情况下,我都使用
get\u有效负载(decode=True)

当我的邮件为“多部分””时,它在我的调试器中显示为简单文本:

Product
Currency
Tenor (months)
Code 1

MyItem  
USD
12
AAA01
当我的邮件不是一个多部分时,它会与HTML标记一起出现在我的调试器中:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head>
<body>
    <table>
        <tr>
            <td><b>Product</b></td>
            <td><b>Currency</b></td>
            <td><b>Tenor (months)</b></td>
            <td><b>Code 1</b></td>
        </tr>
        <tr>
            <td>MyItem</td>
            <td>USD</td>
            <td>12</td>
            <td>AAA01</td>
        </tr>
    </table>
</body>
</html>
def ps_rfq_imap():

编辑:此处是我更新的代码,用于仅捕获电子邮件的HTML部分,以防它能帮助某人:

typ, data = mailbox.fetch( i, '(RFC822)')
    msg=str(email.message_from_string(data[0][1]))


    b = email.message_from_string(msg)
    body = ""

    if b.is_multipart():
        email_from = b['from']
        email_subject = b['subject']
        for part in b.walk():
            ctype = part.get_content_type()
            cdispo = str(part.get('Content-Disposition'))
            # skip any text/plain (txt) attachments
            if ctype == 'text/plain' and 'attachment' not in cdispo:
                continue

            elif ctype == 'text/html':
                print 'HTML PART'

                body = part.get_payload(decode=True)  # decode

                soup = BeautifulSoup(body)

                metaTag = soup.find_all('meta')

                if metaTag is not None:
                    print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table'
                    soup = BeautifulSoup(body, "html.parser")
                    tables = soup.findChildren('table')


                continue
            # not multipart - i.e. plain text, no attachments, keeping fingers crossed
    else:
        continue

致以最诚挚的问候,

查看每个部分的内容类型,不要只是猜测要获取哪个部分。你想要一个有“text/html”类型的。你给了我我所缺少的东西!谢谢!!!将解决方案发布为答案并接受它们!我还是新来的,我该怎么做?我无法对你的第一条评论进行投票:)
#Connection to IMAP/OULTLOOK
url = 'outlook.mycompayny.com'
mailbox = imaplib.IMAP4_SSL(url,993)
user,password = ('mymail@mycompany.com','mypassword')
mailbox.login(user,password)

mailbox.list() # Lists all labels in GMail
mailbox.select('INBOX') # Connected to inbox.

#giving list id, not outlook ones, but uid ones
typ, data = mailbox.search(None,'ALL') 
#Get all the uid outlook of all emails
#typ, data =    mailbox.uid('search', None,'ALL')
ids = data[0]
id_list = ids.split()
print id_list
#get the most recent email id
latest_email_id = int( id_list[-1] )

for i in range( latest_email_id, latest_email_id-(latest_email_id), -1 ):
    print 'EMAIL ID:'
    print i
    typ, data = mailbox.fetch( i, '(RFC822)')
    msg=str(email.message_from_string(data[0][1]))


    b = email.message_from_string(msg)
    body = ""

    if b.is_multipart():
        email_from = b['from']
        email_subject = b['subject']
        print 'FROM:'
        print email_from
        print 'SUBJECT'
        print email_subject
        for part in b.walk():
            ctype = part.get_content_type()
            cdispo = str(part.get('Content-Disposition'))

            # skip any text/plain (txt) attachments
            if ctype == 'text/plain' and 'attachment' not in cdispo:

                body = part.get_payload(decode=True)  # decode
                print    '******************************* MULTIPART body content***********************************'
                print body 
                break
            elif ctype == 'text/html':
                print 'HTML PART'
                continue
            # not multipart - i.e. plain text, no attachments, keeping fingers crossed
    else:
        email_from = b['from']
        email_subject = b['subject']
        print 'FROM:'
        print email_from
        print 'SUBJECT'
        print email_subject
        body = b.get_payload(decode=True)
        print   '******************************* SIMMMMMMPPPPLLLLEEEE***********************************'
        print body 

return body
typ, data = mailbox.fetch( i, '(RFC822)')
    msg=str(email.message_from_string(data[0][1]))


    b = email.message_from_string(msg)
    body = ""

    if b.is_multipart():
        email_from = b['from']
        email_subject = b['subject']
        for part in b.walk():
            ctype = part.get_content_type()
            cdispo = str(part.get('Content-Disposition'))
            # skip any text/plain (txt) attachments
            if ctype == 'text/plain' and 'attachment' not in cdispo:
                continue

            elif ctype == 'text/html':
                print 'HTML PART'

                body = part.get_payload(decode=True)  # decode

                soup = BeautifulSoup(body)

                metaTag = soup.find_all('meta')

                if metaTag is not None:
                    print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table'
                    soup = BeautifulSoup(body, "html.parser")
                    tables = soup.findChildren('table')


                continue
            # not multipart - i.e. plain text, no attachments, keeping fingers crossed
    else:
        continue