TypeError:无法在Python中的类似字节的对象上使用字符串模式_Python_Scrapy

TypeError:无法在Python中的类似字节的对象上使用字符串模式

python scrapy

TypeError:无法在Python中的类似字节的对象上使用字符串模式,python,scrapy,Python,Scrapy,我正在使用Scrapy制作一个电子邮件刮板，我不断收到以下错误： TypeError:无法在类似字节的对象上使用字符串模式以下是我正在使用的Python代码： import re from scrapy.spiders import CrawlSpider, Rule from scrapy.linkextractors import LinkExtractor class EmailSpider(CrawlSpider): name = 'EmailScraper' ema

我正在使用Scrapy制作一个电子邮件刮板，我不断收到以下错误： TypeError:无法在类似字节的对象上使用字符串模式

以下是我正在使用的Python代码：

import re
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class EmailSpider(CrawlSpider):
    name = 'EmailScraper'
    emailHistory = {}
    custom_settings = {
        'ROBOTSTXT_OBEY': False
        #  ,'DEPTH_LIMIT' : 6
    }

emailRegex = re.compile((r"([a-zA-Z0-9_{|}~-]+(?:\.[a-zA-Z0-9_"
                         r"{|}~-]+)*(@)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9]){2,}?(\."
                         r"))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))

def __init__(self, url=None, *args, **kwargs):
    super(EmailSpider, self).__init__(*args, **kwargs)
    self.start_urls = [url]
    self.allowed_domains = [url.replace(
        "http://", "").replace("www.", "").replace("/", "")]
rules = (Rule(LinkExtractor(), callback="parse_item", follow=True),)

def parse_item(self, response):
    emails = re.findall(EmailSpider.emailRegex, response._body)
    for email in emails:
        if email[0] in EmailSpider.emailHistory:
            continue
        else:
            EmailSpider.emailHistory[email[0]] = True
            yield {
                'site': response.url,
                'email': email[0]
            }

我已经看到了很多答案，但我对python非常陌生，因此我不确定如何在代码中实现给定的代码

如果你不介意的话，我也可以告诉我你要把代码放进去

谢谢，Jude Wilson

回复。_body

不是

str

（字符串对象），因此不能在其上使用

re

（regex）。如果您查找它的对象类型，您会发现它是一个

字节

（字节对象）

>>类型（响应.\u正文）

通过将其解码为类似UTF-8的内容，问题应该得到解决

>>类型（响应。_body.decode（'utf-8'））

最终的

re

如下：

emails=re.findall（EmailSpider.emailRegex，response.\u body.decode（'utf-8'））

哪一行出现错误？尝试使用

response.text

而不是

response.body

。