pythonimap：检索MIMEMultipart'；s邮件并使用HTML标记获取正文_Python_Html_Email_Tags_Imap

pythonimap：检索MIMEMultipart'；s邮件并使用HTML标记获取正文

python html email tags

pythonimap：检索MIMEMultipart'；s邮件并使用HTML标记获取正文,python,html,email,tags,imap,Python,Html,Email,Tags,Imap,我的情况如下：我目前正在从收件箱中检索所有outlook（2016）邮件，更具体地说，我正在检索一个表： ¦ Product ¦ Currency ¦ Tenor (months) ¦ Code 1 ¦ ¦ MyItem ¦ USD ¦ 12 ¦ AAA01 ¦ 我的目标是捕获它们的主体，然后将它们存储在MsSQL服务器中我很难理解“多部分””这个术语，现在有了几个小时（长时间）就更清楚了所以现在我的过程是：检查收件箱中的所有邮件创建邮

我的情况如下：我目前正在从收件箱中检索所有outlook（2016）邮件，更具体地说，我正在检索一个表：

¦ Product ¦ Currency ¦ Tenor (months) ¦  Code 1 ¦   
¦ MyItem  ¦  USD     ¦   12           ¦ AAA01   ¦

我的目标是捕获它们的主体，然后将它们存储在MsSQL服务器中

我很难理解“多部分””这个术语，现在有了几个小时（长时间）就更清楚了

所以现在我的过程是：

检查收件箱中的所有邮件
创建邮件Id列表

对于此列表中的所有Id，我正在检查邮件是否为

Multipart

one。

如果Yes->我使用

body=part.获取有效载荷（decode=True）

如果No->我用

body=b检索body。获取有效载荷（decode=True）

因此，在这两种情况下，我都使用

get\u有效负载（decode=True）

当我的邮件为“多部分””时，它在我的调试器中显示为简单文本：

Product
Currency
Tenor (months)
Code 1

MyItem  
USD
12
AAA01

当我的邮件不是一个多部分时，它会与HTML标记一起出现在我的调试器中：

<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head> <body> <table> <tr> <td><b>Product</b></td> <td><b>Currency</b></td> <td><b>Tenor (months)</b></td> <td><b>Code 1</b></td> </tr> <tr> <td>MyItem</td> <td>USD</td> <td>12</td> <td>AAA01</td> </tr> </table> </body> </html>
def ps_rfq_imap（）：
编辑：此处是我更新的代码，用于仅捕获电子邮件的HTML部分，以防它能帮助某人：

typ, data = mailbox.fetch( i, '(RFC822)') msg=str(email.message_from_string(data[0][1])) b = email.message_from_string(msg) body = "" if b.is_multipart(): email_from = b['from'] email_subject = b['subject'] for part in b.walk(): ctype = part.get_content_type() cdispo = str(part.get('Content-Disposition')) # skip any text/plain (txt) attachments if ctype == 'text/plain' and 'attachment' not in cdispo: continue elif ctype == 'text/html': print 'HTML PART' body = part.get_payload(decode=True) # decode soup = BeautifulSoup(body) metaTag = soup.find_all('meta') if metaTag is not None: print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table' soup = BeautifulSoup(body, "html.parser") tables = soup.findChildren('table') continue # not multipart - i.e. plain text, no attachments, keeping fingers crossed else: continue

致以最诚挚的问候，
查看每个部分的内容类型，不要只是猜测要获取哪个部分。你想要一个有“text/html”类型的。你给了我我所缺少的东西！谢谢！！！将解决方案发布为答案并接受它们！我还是新来的，我该怎么做？我无法对你的第一条评论进行投票：）
#Connection to IMAP/OULTLOOK url = 'outlook.mycompayny.com' mailbox = imaplib.IMAP4_SSL(url,993) user,password = ('mymail@mycompany.com','mypassword') mailbox.login(user,password) mailbox.list() # Lists all labels in GMail mailbox.select('INBOX') # Connected to inbox. #giving list id, not outlook ones, but uid ones typ, data = mailbox.search(None,'ALL') #Get all the uid outlook of all emails #typ, data = mailbox.uid('search', None,'ALL') ids = data[0] id_list = ids.split() print id_list #get the most recent email id latest_email_id = int( id_list[-1] ) for i in range( latest_email_id, latest_email_id-(latest_email_id), -1 ): print 'EMAIL ID:' print i typ, data = mailbox.fetch( i, '(RFC822)') msg=str(email.message_from_string(data[0][1])) b = email.message_from_string(msg) body = "" if b.is_multipart(): email_from = b['from'] email_subject = b['subject'] print 'FROM:' print email_from print 'SUBJECT' print email_subject for part in b.walk(): ctype = part.get_content_type() cdispo = str(part.get('Content-Disposition')) # skip any text/plain (txt) attachments if ctype == 'text/plain' and 'attachment' not in cdispo: body = part.get_payload(decode=True) # decode print '******************************* MULTIPART body content***********************************' print body break elif ctype == 'text/html': print 'HTML PART' continue # not multipart - i.e. plain text, no attachments, keeping fingers crossed else: email_from = b['from'] email_subject = b['subject'] print 'FROM:' print email_from print 'SUBJECT' print email_subject body = b.get_payload(decode=True) print '******************************* SIMMMMMMPPPPLLLLEEEE***********************************' print body return body

typ, data = mailbox.fetch( i, '(RFC822)') msg=str(email.message_from_string(data[0][1])) b = email.message_from_string(msg) body = "" if b.is_multipart(): email_from = b['from'] email_subject = b['subject'] for part in b.walk(): ctype = part.get_content_type() cdispo = str(part.get('Content-Disposition')) # skip any text/plain (txt) attachments if ctype == 'text/plain' and 'attachment' not in cdispo: continue elif ctype == 'text/html': print 'HTML PART' body = part.get_payload(decode=True) # decode soup = BeautifulSoup(body) metaTag = soup.find_all('meta') if metaTag is not None: print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table' soup = BeautifulSoup(body, "html.parser") tables = soup.findChildren('table') continue # not multipart - i.e. plain text, no attachments, keeping fingers crossed else: continue