Python 使用IMAP获取电子邮件中的URL无法正常工作

Python 使用IMAP获取电子邮件中的URL无法正常工作,python,imap,screen-scraping,Python,Imap,Screen Scraping,我试图在电子邮件中找到特定的url,我希望能够获得包含特定字符串的每个url。这是我的密码: import imaplib import regex as re def find_urls(string): regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\

我试图在电子邮件中找到特定的url,我希望能够获得包含特定字符串的每个url。这是我的密码:

import imaplib
import regex as re

def find_urls(string):
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(regex,string)
    return([x[0] for x in url])

def save_matching_urls(username, password, sender, url_string):
    print("connecting to email, please wait...")
    con = imaplib.IMAP4_SSL("imap.gmail.com")
    con.login(username, password)
    con.select('INBOX')
    print("connected sucessfully, scraping email from " + sender)
    (_, data) = con.search(None, '(FROM {0})'.format(sender.strip()))
    ids = data[0].split()
    print(str(len(ids)) +" emails found")

    list_urls = []
    list_good_urls = []
    for mail in ids:
        result, data = con.fetch(mail, '(RFC822)') # fetch the email headers and body (RFC822) for the given ID
        raw_email = data[0][1]
        email = raw_email.decode("utf-8").replace("\r", '').replace("\t", '').replace(" ", "").replace("\n", "")
        list_url = find_urls(email)
        for url in list_url:
            if url_string in url:
                list_good_urls.append(url)

    print(str(len(list_good_urls)) + " urls found, saving...")
    with open("{}_urls.txt".format(sender), mode="a", encoding="utf-8") as file:
        for url in list_good_urls:
            file.write(url + '\n')
    print("urls saved !")
代码保存的url(实际上只找到一个url)

我不知道代码的哪一部分导致了这个问题,任何帮助都将不胜感激

变体:

from imap_tools import MailBox, A
from magic import find_urls

with MailBox('imap.mail.com').login('test@mail.com', 'pwd', 'INBOX') as mailbox:
    for msg in mailbox.fetch(A(all=True)):
        body = msg.text or msg.html
        urls = find_urls(body)
*尊敬,imap_工具的作者


您的电子邮件似乎是使用编码进行编码的。在处理文本之前,可以使用Python的模块对文本进行解码。或者使用
电子邮件
模块解析电子邮件,如的回答中所述。或者在对部件运行regexp之前,使用电子邮件模块实际解码电子邮件。如果您针对原始文本运行它,您将什么也得不到(如果它是base64编码的)或令人困惑的东西(如果它是可打印的)。原始文本无法按原样使用。两个选项都工作正常。谢谢
IsMyEmailWorking.com/Block.aspx=20to=20temporarily=20block==20your=20email=20address=20for=201=20hour.=20This=20solves=20the=20problem==2099%=20of=20the=20time.=20If=20after=20this=20you=20continue=20to=20have==20problems=20please=20contact=20us=20via=20the=20contact=20link=20on=20our==20website=20at=20http://IsMyEmailWorking.com/Contact.aspx

from imap_tools import MailBox, A
from magic import find_urls

with MailBox('imap.mail.com').login('test@mail.com', 'pwd', 'INBOX') as mailbox:
    for msg in mailbox.fetch(A(all=True)):
        body = msg.text or msg.html
        urls = find_urls(body)