Python 2.7 I'；我试图通过scrapy从网站链接中提取数据，但代码中出现了错误_Python 2.7_Web Scraping_Scrapy

Python 2.7 I'；我试图通过scrapy从网站链接中提取数据，但代码中出现了错误

python-2.7 web-scraping scrapy

Python 2.7 I'；我试图通过scrapy从网站链接中提取数据，但代码中出现了错误,python-2.7,web-scraping,scrapy,Python 2.7,Web Scraping,Scrapy,我试图从网站的链接中提取数据。我所遵循的路径是：主页 -链接 -链接要提取的数据（基本上，我试图提取名人的所有出生细节）我的剪贴代码如下： import scrapy class celebritiesItem(scrapy.Item): Name = scrapy.Field() Profession = scrapy.Field() Died_On = scrapy.Field()

我试图从网站的链接中提取数据。我所遵循的路径是：

主页

-链接

要提取的数据（基本上，我试图提取名人的所有出生细节）

我的剪贴代码如下：

    import scrapy
    class celebritiesItem(scrapy.Item):

           Name = scrapy.Field()
           Profession = scrapy.Field()
           Died_On = scrapy.Field()
           Birth_Place = scrapy.Field()
           Nationality = scrapy.Field()
           Birth_Date = scrapy.Field()

    class celebrities(scrapy.Spider):
          name = "people"
          allowed_domains = ["thefamouspeople.com"]
          start_urls = [
         "http://www.thefamouspeople.com/famous-people-by-birthday.php"
          ]

          def parse(self, response):

          links = response.xpath('//div[@class="pod colorbar 
          editorial"]//@href').extract()
          for link in links:
               abs_url = response.xpath('//div[@class="pod colorbar 
               editorial"]//@href').extract()
           yield scrapy.Request(abs_url, callback=self.parse)

          #items[]
          item = celebritiesItem()
          item["Name"] = 
          response.xpath('//div[@class="section"]//a[2]//text()').extract()
          item["Profession"] = 
          response.xpath('//div[@class="section"]//span//text()').extract()
          item["Died_On"] = 
          response.xpath('//div[@class="section"]//p[1]//text()').extract()
          item["Birth_Place"] = 
          response.xpath('//div[@class="section"]//p[2]//text()').extract()
          item["Nationality"] = 
          response.xpath('//div[@class="section"]//p[3]//text()').extract()
          item["Birth_Date"] = 
          response.xpath('//div[@class="section"]//p[4]//text()').extract()
          yield item

我得到了以下错误：

raise TypeError（'请求url必须是str或unicode，获取%s:'%type（url）。名称）

此部分导致错误

      for link in links:
           abs_url = response.xpath('//div[@class="pod colorbar 
           editorial"]//@href').extract()
       yield scrapy.Request(abs_url, callback=self.parse)

您需要缩进yield语句并创建正确的url

      for link in links:
           abs_url = response.urljoin(link)
           yield scrapy.Request(abs_url, callback=self.parse)

它起作用了。。虽然我没有以正确的格式获取数据，但我使用了extract_first（）而不是extract（）