如何使用scrapy从amazon提取所有品牌的列表
我正试图在amazon.com上使用scrapy爬行,并试图收集列出的所有品牌的数据。 这是一个潦草的脚本:如何使用scrapy从amazon提取所有品牌的列表,scrapy,scrapy-spider,Scrapy,Scrapy Spider,我正试图在amazon.com上使用scrapy爬行,并试图收集列出的所有品牌的数据。 这是一个潦草的脚本: class StayuncleCrawlerSpider(CrawlSpider): name = 'amazon_crawler' allowed_domains = ['amazon.com'] start_urls = ['https://www.amazon.com/gp/search/other/ref=sr_in_a_V?rh=i%3Aelectr
class StayuncleCrawlerSpider(CrawlSpider):
name = 'amazon_crawler'
allowed_domains = ['amazon.com']
start_urls = ['https://www.amazon.com/gp/search/other/ref=sr_in_a_V?rh=i%3Aelectronics%2Cn%3A172282&pickerToList=brandtextbin&indexField=a&ie=UTF8&qid=1466664617']
CrawlSpider.DOWNLOAD_DELAY=2;
rules = [Rule(SgmlLinkExtractor(allow=("/gp/search/other/ref")), callback='parse_item', follow=True) ]
def parse_item(self,response):
global i
body = response.xpath('//body//div[@id="center"]')
texts = body.xpath('.//span').extract()
print texts
ptext ="/Users/Nand/crawledData/html/"+response.url.split("/")[-2] +str(i)+'.txt'
for text in texts:
if text:
text = text.rstrip()
print text.encode('utf-8')
with open(ptext, 'ab') as f:
f.write(text.encode('utf-8'))
f.write("\n")
item = DmozItem()
yield item
这里是起始URL
https://www.amazon.com/gp/search/other/ref=sr_in_a_V?rh=i%3Aelectronics%2Cn%3A172282&pickerToList=brandtextbin&indexField=a&ie=UTF8&qid=1466664617
这是我想要抓取的HTML部分
<div class="a-row a-spacing-none pagn">
<span class="pagnLead">Viewing:</span>
<span class="pagnLink"><a href="/gp/search/other/ref=sr_in_-2_A?rh=i%3Aelectronics%2Cn%3A172282&pickerToList=brandtextbin&ie=UTF8&qid=1466668789">Top Brands</a>
</span>
有人能帮我修改脚本吗
A & I Products
A & L Engraving
and so on..