pythonimap:检索MIMEMultipart';s邮件并使用HTML标记获取正文
我的情况如下:我目前正在从收件箱中检索所有outlook(2016)邮件,更具体地说,我正在检索一个表:pythonimap:检索MIMEMultipart';s邮件并使用HTML标记获取正文,python,html,email,tags,imap,Python,Html,Email,Tags,Imap,我的情况如下:我目前正在从收件箱中检索所有outlook(2016)邮件,更具体地说,我正在检索一个表: ¦ Product ¦ Currency ¦ Tenor (months) ¦ Code 1 ¦ ¦ MyItem ¦ USD ¦ 12 ¦ AAA01 ¦ 我的目标是捕获它们的主体,然后将它们存储在MsSQL服务器中 我很难理解“多部分””这个术语,现在有了几个小时(长时间)就更清楚了 所以现在我的过程是: 检查收件箱中的所有邮件 创建邮
¦ Product ¦ Currency ¦ Tenor (months) ¦ Code 1 ¦
¦ MyItem ¦ USD ¦ 12 ¦ AAA01 ¦
我的目标是捕获它们的主体,然后将它们存储在MsSQL服务器中
我很难理解“多部分””这个术语,现在有了几个小时(长时间)就更清楚了
所以现在我的过程是:
- 检查收件箱中的所有邮件
- 创建邮件Id列表
- 对于此列表中的所有Id,我正在检查邮件是否为
one。Multipart
- 如果Yes->我使用
body=part.获取有效载荷(decode=True)
- 如果No->我用
body=b检索body。获取有效载荷(decode=True)
- 如果Yes->我使用
get\u有效负载(decode=True)
当我的邮件为“多部分””时,它在我的调试器中显示为简单文本:
Product
Currency
Tenor (months)
Code 1
MyItem
USD
12
AAA01
当我的邮件不是一个多部分时,它会与HTML标记一起出现在我的调试器中:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head>
<body>
<table>
<tr>
<td><b>Product</b></td>
<td><b>Currency</b></td>
<td><b>Tenor (months)</b></td>
<td><b>Code 1</b></td>
</tr>
<tr>
<td>MyItem</td>
<td>USD</td>
<td>12</td>
<td>AAA01</td>
</tr>
</table>
</body>
</html>
def ps_rfq_imap():
编辑:此处是我更新的代码,用于仅捕获电子邮件的HTML部分,以防它能帮助某人:
typ, data = mailbox.fetch( i, '(RFC822)')
msg=str(email.message_from_string(data[0][1]))
b = email.message_from_string(msg)
body = ""
if b.is_multipart():
email_from = b['from']
email_subject = b['subject']
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
continue
elif ctype == 'text/html':
print 'HTML PART'
body = part.get_payload(decode=True) # decode
soup = BeautifulSoup(body)
metaTag = soup.find_all('meta')
if metaTag is not None:
print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table'
soup = BeautifulSoup(body, "html.parser")
tables = soup.findChildren('table')
continue
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
continue
致以最诚挚的问候,查看每个部分的内容类型,不要只是猜测要获取哪个部分。你想要一个有“text/html”类型的。你给了我我所缺少的东西!谢谢!!!将解决方案发布为答案并接受它们!我还是新来的,我该怎么做?我无法对你的第一条评论进行投票:)
#Connection to IMAP/OULTLOOK
url = 'outlook.mycompayny.com'
mailbox = imaplib.IMAP4_SSL(url,993)
user,password = ('mymail@mycompany.com','mypassword')
mailbox.login(user,password)
mailbox.list() # Lists all labels in GMail
mailbox.select('INBOX') # Connected to inbox.
#giving list id, not outlook ones, but uid ones
typ, data = mailbox.search(None,'ALL')
#Get all the uid outlook of all emails
#typ, data = mailbox.uid('search', None,'ALL')
ids = data[0]
id_list = ids.split()
print id_list
#get the most recent email id
latest_email_id = int( id_list[-1] )
for i in range( latest_email_id, latest_email_id-(latest_email_id), -1 ):
print 'EMAIL ID:'
print i
typ, data = mailbox.fetch( i, '(RFC822)')
msg=str(email.message_from_string(data[0][1]))
b = email.message_from_string(msg)
body = ""
if b.is_multipart():
email_from = b['from']
email_subject = b['subject']
print 'FROM:'
print email_from
print 'SUBJECT'
print email_subject
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload(decode=True) # decode
print '******************************* MULTIPART body content***********************************'
print body
break
elif ctype == 'text/html':
print 'HTML PART'
continue
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
email_from = b['from']
email_subject = b['subject']
print 'FROM:'
print email_from
print 'SUBJECT'
print email_subject
body = b.get_payload(decode=True)
print '******************************* SIMMMMMMPPPPLLLLEEEE***********************************'
print body
return body
typ, data = mailbox.fetch( i, '(RFC822)')
msg=str(email.message_from_string(data[0][1]))
b = email.message_from_string(msg)
body = ""
if b.is_multipart():
email_from = b['from']
email_subject = b['subject']
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
continue
elif ctype == 'text/html':
print 'HTML PART'
body = part.get_payload(decode=True) # decode
soup = BeautifulSoup(body)
metaTag = soup.find_all('meta')
if metaTag is not None:
print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table'
soup = BeautifulSoup(body, "html.parser")
tables = soup.findChildren('table')
continue
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
continue