Python Scrapy分析方法不工作
我正在废弃一个我已经用scrapy编写了一个蜘蛛,但我能够使用以下内容提取产品价格:Python Scrapy分析方法不工作,python,xpath,web-scraping,scrapy,scrapy-spider,Python,Xpath,Web Scraping,Scrapy,Scrapy Spider,我正在废弃一个我已经用scrapy编写了一个蜘蛛,但我能够使用以下内容提取产品价格: hxs.select('//div[@class="product_list"]//div[@class="product_list_offerprice"]/text()').extract() 通过粘壳 但当我尝试对spider执行相同操作时,它返回的是空列表 这是我的蜘蛛代码: from eScraper.items import EscraperItem from scrapy.selector im
hxs.select('//div[@class="product_list"]//div[@class="product_list_offerprice"]/text()').extract()
通过粘壳
但当我尝试对spider执行相同操作时,它返回的是空列表
这是我的蜘蛛代码:
from eScraper.items import EscraperItem
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider
#------------------------------------------------------------------------------
class ESpider(CrawlSpider):
name = "ashikamallSpider"
allowed_domains = ["ashikamall.com"]
URLSList = []
for n in range (1,51):
URLSList.append('http://ashikamall.com/products.aspx?id=222&page=' + str(n))
start_urls = URLSList
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="product_list"]')
items = []
for site in sites:
item = EscraperItem()
item['productDesc'] = ""
item['productSite'] = "http://1click1call.com/"
item['productTitle'] = site.select('div[@class="product_list_name"]/h3/text()').extract()
item['productPrice'] = site.select('div[@class="product_list_offerprice"]/text()').extract()
item['productURL'] = "http://ashikamall.com/" + site.select('div[@class="product_list_image"]/a/@href').extract()[0].encode('utf-8')
item['productImage'] = "http://ashikamall.com/" + site.select('div[@class="product_list_image"]/a/img/@src').extract()[0].encode('utf-8')
items.append(item)
return items
这是我的物品
from scrapy.item import Item, Field
#------------------------------------------------------------------------------
class EscraperItem(Item):
image_urls = Field()
productURL = Field()
productDesc = Field()
image_paths = Field()
productSite = Field()
productTitle = Field()
productPrice = Field()
productImage = Field()
有人能帮我吗?问题在于你的XPath,它们应该是相对的
问题在于你的XPath,它们应该是相对的
@Medeiros这是一种使XPath与当前元素相关的语法,基本上是在当前元素内部搜索。@Medeiros取决于您解析的站点,这是我的理解。@Medeiros这是一种使XPath与当前元素相关的语法,基本上,在当前元素内搜索。@Medeiros取决于您解析的站点,这是我的理解。
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider
from scrapy.item import Item, Field
class EscraperItem(Item):
image_urls = Field()
productURL = Field()
productDesc = Field()
image_paths = Field()
productSite = Field()
productTitle = Field()
productPrice = Field()
productImage = Field()
class ESpider(CrawlSpider):
name = "ashikamallSpider"
allowed_domains = ["ashikamall.com"]
start_urls = ['http://ashikamall.com/products.aspx?id=222&page=%s' % n for n in range(1, 51)]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="product_list"]')
items = []
for site in sites:
item = EscraperItem()
item['productDesc'] = ""
item['productSite'] = "http://1click1call.com/"
item['productTitle'] = site.select('.//div[@class="product_list_name"]/h3/text()').extract()
item['productPrice'] = site.select('.//div[@class="product_list_offerprice"]/text()').extract()
item['productURL'] = "http://ashikamall.com/" + site.select('.//div[@class="product_list_image"]/a/@href').extract()[0].encode('utf-8')
item['productImage'] = "http://ashikamall.com/" + site.select('.//div[@class="product_list_image"]/a/img/@src').extract()[0].encode('utf-8')
items.append(item)
return items