Python 2.7 I';我试图通过scrapy从网站链接中提取数据,但代码中出现了错误
我试图从网站的链接中提取数据。我所遵循的路径是: 主页 -链接 -链接Python 2.7 I';我试图通过scrapy从网站链接中提取数据,但代码中出现了错误,python-2.7,web-scraping,scrapy,Python 2.7,Web Scraping,Scrapy,我试图从网站的链接中提取数据。我所遵循的路径是: 主页 -链接 -链接 要提取的数据(基本上,我试图提取名人的所有出生细节) 我的剪贴代码如下: import scrapy class celebritiesItem(scrapy.Item): Name = scrapy.Field() Profession = scrapy.Field() Died_On = scrapy.Field()
- 要提取的数据(基本上,我试图提取名人的所有出生细节)
import scrapy
class celebritiesItem(scrapy.Item):
Name = scrapy.Field()
Profession = scrapy.Field()
Died_On = scrapy.Field()
Birth_Place = scrapy.Field()
Nationality = scrapy.Field()
Birth_Date = scrapy.Field()
class celebrities(scrapy.Spider):
name = "people"
allowed_domains = ["thefamouspeople.com"]
start_urls = [
"http://www.thefamouspeople.com/famous-people-by-birthday.php"
]
def parse(self, response):
links = response.xpath('//div[@class="pod colorbar
editorial"]//@href').extract()
for link in links:
abs_url = response.xpath('//div[@class="pod colorbar
editorial"]//@href').extract()
yield scrapy.Request(abs_url, callback=self.parse)
#items[]
item = celebritiesItem()
item["Name"] =
response.xpath('//div[@class="section"]//a[2]//text()').extract()
item["Profession"] =
response.xpath('//div[@class="section"]//span//text()').extract()
item["Died_On"] =
response.xpath('//div[@class="section"]//p[1]//text()').extract()
item["Birth_Place"] =
response.xpath('//div[@class="section"]//p[2]//text()').extract()
item["Nationality"] =
response.xpath('//div[@class="section"]//p[3]//text()').extract()
item["Birth_Date"] =
response.xpath('//div[@class="section"]//p[4]//text()').extract()
yield item
我得到了以下错误:
raise TypeError('请求url必须是str或unicode,获取%s:'%type(url)。名称)此部分导致错误
for link in links:
abs_url = response.xpath('//div[@class="pod colorbar
editorial"]//@href').extract()
yield scrapy.Request(abs_url, callback=self.parse)
您需要缩进yield语句并创建正确的url
for link in links:
abs_url = response.urljoin(link)
yield scrapy.Request(abs_url, callback=self.parse)
它起作用了。。虽然我没有以正确的格式获取数据,但我使用了extract_first()而不是extract()