Python 2.7 I';我试图通过scrapy从网站链接中提取数据,但代码中出现了错误

Python 2.7 I';我试图通过scrapy从网站链接中提取数据,但代码中出现了错误,python-2.7,web-scraping,scrapy,Python 2.7,Web Scraping,Scrapy,我试图从网站的链接中提取数据。我所遵循的路径是: 主页 -链接 -链接 要提取的数据(基本上,我试图提取名人的所有出生细节) 我的剪贴代码如下: import scrapy class celebritiesItem(scrapy.Item): Name = scrapy.Field() Profession = scrapy.Field() Died_On = scrapy.Field()

我试图从网站的链接中提取数据。我所遵循的路径是:

主页

-链接

-链接

  • 要提取的数据(基本上,我试图提取名人的所有出生细节)
我的剪贴代码如下:

    import scrapy
    class celebritiesItem(scrapy.Item):

           Name = scrapy.Field()
           Profession = scrapy.Field()
           Died_On = scrapy.Field()
           Birth_Place = scrapy.Field()
           Nationality = scrapy.Field()
           Birth_Date = scrapy.Field()

    class celebrities(scrapy.Spider):
          name = "people"
          allowed_domains = ["thefamouspeople.com"]
          start_urls = [
         "http://www.thefamouspeople.com/famous-people-by-birthday.php"
          ]

          def parse(self, response):

          links = response.xpath('//div[@class="pod colorbar 
          editorial"]//@href').extract()
          for link in links:
               abs_url = response.xpath('//div[@class="pod colorbar 
               editorial"]//@href').extract()
           yield scrapy.Request(abs_url, callback=self.parse)

          #items[]
          item = celebritiesItem()
          item["Name"] = 
          response.xpath('//div[@class="section"]//a[2]//text()').extract()
          item["Profession"] = 
          response.xpath('//div[@class="section"]//span//text()').extract()
          item["Died_On"] = 
          response.xpath('//div[@class="section"]//p[1]//text()').extract()
          item["Birth_Place"] = 
          response.xpath('//div[@class="section"]//p[2]//text()').extract()
          item["Nationality"] = 
          response.xpath('//div[@class="section"]//p[3]//text()').extract()
          item["Birth_Date"] = 
          response.xpath('//div[@class="section"]//p[4]//text()').extract()
          yield item
我得到了以下错误:


raise TypeError('请求url必须是str或unicode,获取%s:'%type(url)。名称

此部分导致错误

      for link in links:
           abs_url = response.xpath('//div[@class="pod colorbar 
           editorial"]//@href').extract()
       yield scrapy.Request(abs_url, callback=self.parse)
您需要缩进yield语句并创建正确的url

      for link in links:
           abs_url = response.urljoin(link)
           yield scrapy.Request(abs_url, callback=self.parse)

它起作用了。。虽然我没有以正确的格式获取数据,但我使用了extract_first()而不是extract()