Python Scrapy爬虫返回重复值
下面是我的完整代码。我不知道为什么它会返回很多副本。有什么办法吗? 我正在尝试从此链接“”请求所有区域并提取代理的信息Python Scrapy爬虫返回重复值,python,python-3.x,web-scraping,scrapy,Python,Python 3.x,Web Scraping,Scrapy,下面是我的完整代码。我不知道为什么它会返回很多副本。有什么办法吗? 我正在尝试从此链接“”请求所有区域并提取代理的信息 # -*- coding: utf-8 -*- import scrapy class MainSpider(scrapy.Spider): name = 'main' start_urls = ["https://www.compass.com/agents"] def parse(self, response): regions
# -*- coding: utf-8 -*-
import scrapy
class MainSpider(scrapy.Spider):
name = 'main'
start_urls = ["https://www.compass.com/agents"]
def parse(self, response):
regions = response.xpath('//ul[@class="geoLinks-list textIntent-caption1--strong"]/li')
for each in regions:
region_link = each.xpath('.//a/@href').get()
region_name = each.xpath('.//a/text()').get()
yield response.follow(url=region_link, callback=self.parse_data, meta={"region_text": region_name})
def parse_data(self, response):
region = response.request.meta["region_text"]
agents = response.xpath('//div[@class="agentCard-contact"]')
for agent in agents:
name = agent.xpath('normalize-space(//div[@class="agentCard-contact"]/a/text())').get()
profile_link = agent.xpath('//div[@class="agentCard-contact"]/a/@href').get()
email = agent.xpath('//a[@class="textIntent-body agentCard-email"]/@href').get()
mobile = agent.xpath('//a[@class="textIntent-body agentCard-phone"]/@href').get()
yield {
"Name": name,
"Profile_link": profile_link,
"Email": email,
"Mobile": mobile,
"Region": region,
}
我觉得您的xpath存在问题。使用以下命令更改xpath,然后重试:
name=agent.xpath('normalize-space(.//a[@class=“textIntent-headline1-agentCard name”]/text())。get()
profile_link=agent.xpath('.//a[@class=“textIntent-headline1 agentCard name”]/@href').get()
email=agent.xpath('.//a[@class=“textIntent body agentCard email”]/@href').get()
mobile=agent.xpath('.//a[@class=“textIntent body agentCard phone”]/@href').get()