带嵌套XPath的Scrapy和XPath问题
我正试图把亚马逊的产品解读成scrapy。 使用此XPath从随机类别开始:带嵌套XPath的Scrapy和XPath问题,xpath,scrapy,Xpath,Scrapy,我正试图把亚马逊的产品解读成scrapy。 使用此XPath从随机类别开始: products = Selector(response).xpath('//div[@class="s-item-container"]') for product in products: item = AmzItem() item['title'] = product.xpath('//a[@class="s-access-detail-page"]/@title').extract()[0]
products = Selector(response).xpath('//div[@class="s-item-container"]')
for product in products:
item = AmzItem()
item['title'] = product.xpath('//a[@class="s-access-detail-page"]/@title').extract()[0]
item['url'] = product.xpath('//a[@class="s-access-detail-page"]/@href').extract()[0]
yield item
('//div[@class=“s-item-container”]')
在一个类别页面上返回产品的所有div-这是正确的
现在,我如何获得该产品的链接
//代表代码中的任何地方
带有@class的类应该选择正确的类
但我得到一个:
item['title']=product.xpath('//a[@class=“s-access-detail-page”]/@title').extract()[0]
异常。索引器错误:列表索引超出范围
所以我的匹配这个XPath的列表必须是空的——但我不明白为什么
编辑:HTML将如下所示:
<div class="s-item-container" style="height: 343px;">
<div class="a-row a-spacing-base">
<div class="a-column a-span12 a-text-left">
<div class="a-section a-spacing-none a-inline-block s-position-relative">
<a class="a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer"><img alt="Product Details" src="http://ecx.images-amazon.com/images/I/41%2BzrAY74UL._AA160_.jpg" onload="viewCompleteImageLoaded(this, new Date().getTime(), 24, false);" class="s-access-image cfMarker" height="160" width="160"></a>
<div class="a-section a-spacing-none a-text-center">
<div class="a-row a-spacing-top-mini">
<a class="a-size-mini a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer">
<div class="a-box">
<div class="a-box-inner a-padding-mini"><span class="a-color-secondary">See more choices</span></div>
</div>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="a-row a-spacing-mini">
<div class="a-row a-spacing-none">
<a class="a-link-normal s-access-detail-page a-text-normal" title="Harry Potter Gryffindor School Fancy Robe Cloak Costume And Tie (Size S)" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer">
<h2 class="a-size-base a-color-null s-inline s-access-title a-text-normal">Harry Potter Gryffindor School Fancy Robe Cloak Costume And Tie (Size S)</h2>
</a>
</div>
<div class="a-row a-spacing-mini"><span class="a-size-small a-color-secondary">by </span><span class="a-size-small a-color-secondary">Legend</span></div>
</div>
<div class="a-row a-spacing-mini">
<div class="a-row a-spacing-none"><a class="a-size-small a-link-normal a-text-normal" href="http://www.amazon.com/gp/offer-listing/B0105S434A/ref=sr_1_21_olp?s=pet-supplies&ie=UTF8&qid=1435391788&sr=1-21&keywords=pet+supplies&condition=new"><span class="a-size-base a-color-price a-text-bold">$28.99</span><span class="a-letter-space"></span>new<span class="a-letter-space"></span><span class="a-color-secondary">(1 offer)</span><span class="a-letter-space"></span><span class="a-color-secondary a-text-strike"></span></a></div>
</div>
<div class="a-row a-spacing-none"><span name="B0105S434A">
<span class="a-declarative" data-action="a-popover" data-a-popover="{"max-width":"700","closeButton":"false","position":"triggerBottom","url":"/review/widgets/average-customer-review/popover/ref=acr_search__popover?ie=UTF8&asin=B0105S434A&contextId=search&ref=acr_search__popover"}"><a href="javascript:void(0)" class="a-popover-trigger a-declarative"><i class="a-icon a-icon-star a-star-4"><span class="a-icon-alt">3.9 out of 5 stars</span></i><i class="a-icon a-icon-popover"></i></a></span></span>
<a class="a-size-small a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer">48</a>
</div>
</div>
传说
它应该是:
# ------------- The dot makes the query relative to product
product.xpath('.//a[@class="s-access-detail-page"]/@title')
//a[@class=“s-access-detail-page”]
要求完全是class=“s-access-detail-page”
,因为xpath与字符串一起工作,但不具有含义:)当您有“多类”时,使用包含函数
//a[contains(concat(' ', @class, ' '), " s-access-detail-page ")]/@title
请发布相关HTML的片段。否-我仍然收到此版本的空列表。但是我已经添加了我的HTML,可能会有帮助吗?好的,让我检查一下a@class=“s-access-detail-page”
不是div@class=“s-item-container”
。。这不是很明显吗?是的。但我不明白为什么它不起作用。只有一个带有s-access-detail-page的a,因此我无法使用//或//选择它。这意味着我需要使用整个路径?分区/分区/a@?我原以为//或//正是为了避免添加整个路径?选择正确的路径需要包含,但我在HTML的另一个区域使用了//现在它工作了,谢谢。我必须删除concat部分-否则我只收到一个“exceptions.ValueError:Invalid XPath”-但现在它似乎工作了。还有一个问题-不确定这是否来自Xpath或其他内容-我将继续挖掘。如果您仔细使用引号,这可能是Xpath实现的问题:(现在在多个领域都可以使用它了-感谢您为我指出了正确的包含方向。很好!我很高兴它有所帮助