Python 3.x InvalidSchema:未找到任何连接适配器python3.5.2

Python 3.x InvalidSchema:未找到任何连接适配器python3.5.2,python-3.x,httprequest,Python 3.x,Httprequest,我正在尝试从网页中提取电子邮件,以下是我的电子邮件抓取功能: def emlgrb(x): email_set = set() for url in x: try: response = requests.get(url) soup = bs.BeautifulSoup(response.text, "lxml") emails = set(re.findall(r"[a-z0-9\.\-+_

我正在尝试从网页中提取电子邮件,以下是我的电子邮件抓取功能:

def emlgrb(x):
    email_set = set()
    for url in x:
        try:
            response = requests.get(url)
            soup = bs.BeautifulSoup(response.text, "lxml")
            emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", soup.text, re.I))
            email_set.update(emails)
        except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
        continue
    return email_set
def handle_local_links(url, link):
    if link.startswith("/"):
         return "".join([url, link])
    return link

def get_links(url):
    try:
        response = requests.get(url, timeout=5)
        soup = bs.BeautifulSoup(response.text, "lxml")
        body = soup.body
        links = [link.get("href") for link in body.find_all("a")]
        links = [handle_local_links(url, link) for link in links]
        links = [str(link.encode("ascii")) for link in links]
        return links
此函数应由另一个函数提供,该函数创建url列表。馈线功能:

def emlgrb(x):
    email_set = set()
    for url in x:
        try:
            response = requests.get(url)
            soup = bs.BeautifulSoup(response.text, "lxml")
            emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", soup.text, re.I))
            email_set.update(emails)
        except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
        continue
    return email_set
def handle_local_links(url, link):
    if link.startswith("/"):
         return "".join([url, link])
    return link

def get_links(url):
    try:
        response = requests.get(url, timeout=5)
        soup = bs.BeautifulSoup(response.text, "lxml")
        body = soup.body
        links = [link.get("href") for link in body.find_all("a")]
        links = [handle_local_links(url, link) for link in links]
        links = [str(link.encode("ascii")) for link in links]
        return links
它会继续执行许多异常,如果引发这些异常,将返回空列表(不重要)。但是,get_links()的返回值如下所示:

["b'https://pythonprogramming.net/parsememcparseface//'"]
['https://pythonprogramming.net/parsememcparseface//']
当然,列表中有很多链接(不能发布-声誉)。emlgrb()函数无法处理列表(InvalidSchema:未找到任何连接适配器),但是如果手动删除b和冗余引号,则列表如下所示:

["b'https://pythonprogramming.net/parsememcparseface//'"]
['https://pythonprogramming.net/parsememcparseface//']
emlgrb()可以工作。欢迎任何关于问题所在或创建“清洁功能”以从第一个列表中获取第二个列表的建议


谢谢

解决方案是删除
.encode('ascii')

您可以在
str()
中添加编码,例如:
str(object=b'',encoding='utf-8',errors='strict')


这是因为str()在
对象上调用
。\uuuurepr\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。实际上,这就是当您执行
打印(bytes\u obj)
时打印的内容。在str对象上调用
.ecnode()
,将创建bytes对象

如果删除.encode('ascii'),输出会是什么样子?实际上,效果很好-谢谢。我认为在str()中也可以指定编码?如果你需要;)我在答案中添加了一些解释,效果好吗?:)很抱歉反应太晚。工作完全符合预期。谢谢