Python Scrapy:如何在嵌套的div'中提取内容;s(xpath选择器)?
请参阅下面的html标记。如何使用Scrapy中的xpath选择器从div中的col-sm-7类名中提取内容 我想提取以下文本: Infordrend EonNAS Pro 850x8托架塔式NAS,带10GbE HTML:Python Scrapy:如何在嵌套的div'中提取内容;s(xpath选择器)?,python,xpath,scrapy,web-crawler,Python,Xpath,Scrapy,Web Crawler,请参阅下面的html标记。如何使用Scrapy中的xpath选择器从div中的col-sm-7类名中提取内容 我想提取以下文本: Infordrend EonNAS Pro 850x8托架塔式NAS,带10GbE HTML: <div class="pricing panel panel-primary"> <div class="panel-heading">Infortrend Products</div> <div class="bod
<div class="pricing panel panel-primary">
<div class="panel-heading">Infortrend Products</div>
<div class="body">
<div class="panel-subheading"><strong>EonNAS Pro Models</strong></div>
<div class="row">
<div class="col-sm-7"><strong>Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE</strong><br />
<small>Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)</small></div>
<div class="col-sm-3">#ENP8502MD-0030<br />
<strong> Our Price: $2,873.00</strong></div>
<div class="col-sm-2">
<form action="/addcart.asp" method="get">
<input type="hidden" name="item" value="ENP8502MD-0030 - Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE (Drives Not Included)">
<input type="hidden" name="price" value="$2873.00">
<input type="hidden" name="custID" value="">
<input type="hidden" name="quantity" value="1">
<button type="submit" class="btn btn-primary center-block"><i class="fa fa-shopping-cart"></i> Add to Cart</button>
</form>
</div>
</div>
</div>
</div>
试试这个:
response.xpath('//*[@class="col-sm-7"]//strong//text()').extract()
希望对您有所帮助:)您可以在
元素之间获取文本,如下所示:
print(response.xpath('//div[@class="col-sm-7"]//text()').extract()[0].strip())
elem_text = ' '.join([txt.strip() for txt in response.xpath('//div[@class="col-sm-7"]//text()').extract()])
print(elem_text)
或
上述两项声明将导致:
Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE
Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)
您可以使用
//text()
获取此div内所有元素之间的文本,包括元素内的
和
标记,如下所示:
print(response.xpath('//div[@class="col-sm-7"]//text()').extract()[0].strip())
elem_text = ' '.join([txt.strip() for txt in response.xpath('//div[@class="col-sm-7"]//text()').extract()])
print(elem_text)
这将导致:
Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE
Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)
尝试此xpath表达式
//div[@class=“col-sm-7”]/strong/text()