Python 为我的代码使用多线程/多处理加速抓取

Python 为我的代码使用多线程/多处理加速抓取,python,web-scraping,concurrency,scrapy,Python,Web Scraping,Concurrency,Scrapy,如何使用多线程/多处理加快我的scrapy代码? 我在下面附上了我的代码,我不熟悉python中的线程,也不知道从哪里开始,如果有人能帮我写这段代码 import scrapy 导入日志记录 域https://www.spdigital.cl/categories/view/' 类别=[ '334' , '335', '553', '607', '336', '340', '339', '540', '486', '489', '485', '598', '347', '562','348',

如何使用多线程/多处理加快我的scrapy代码? 我在下面附上了我的代码,我不熟悉python中的线程,也不知道从哪里开始,如果有人能帮我写这段代码

import scrapy
导入日志记录
域https://www.spdigital.cl/categories/view/'
类别=[
'334' , '335', '553', '607', '336', '340', '339', '540', '486', '489', '485', '598', '347', '562','348', '349', '353', '351', '352', '532', '350',
'477', '475', '476', '474', '559','355', '356', '580', '337', '357', '358', '360', '374', '363', '362', '361', '338', '344', '593', '359', '604',
'478', '507', '509', '508', '510', '512', '600', '590', '511', '459','564', '376', '375', '558', '341', '377', '378', '484', '554', '567', '563', '379', '342', '343',
'370', '481', '365', '556', '364', '541', '555', '492', '570','579', '576', '574', '575', '572', '578', '577', '588', '573',
'596', '597', '601', '595','387', '468', '536', '391', '390', '589', '389','399', '394', '396', '397', '398', '392', '592', '401', '402', '530', '560',
'407', '406', '408', '404', '403', '405','413', '411', '414', '410', '409', '412','418', '599', '603', '465', '415', '487', '416', '382', '419', '417', '479',
'515', '582', '518', '514', '581', '583', '517', '519', '520','420', '421', '422', '423', '424', '425', '521', '557', '538', '428', '430', '432', '434', '436', '433', '435', '427', '437', '429', '482',
'544', '552', '545', '546', '550', '547', '551', '549', '548','491', '535', '494', '493', '472', '471', '470', '534', '537',
'587', '586', '585','602', '569', '561','438', '446', '488', '439', '496', '440', '566', '445', '447', '565','547', '448', '449', '450', '451', '452', '531', '453', '454', '456', '455',
'501', '505', '506', '504', '502', '498', '500', '503', '369','527', '460', '529', '606', '528', '591', '462', '526', '525', '605', '463', '464',
]
类别ProductsSpider(scrapy.Spider):
名称='productos'
允许的_域=['www.spdigital.cl']
def start_请求(自我):
对于i类:
产生scrapy.Request(url=domain+i,callback=self.parse,headers={
“用户代理”:“Mozilla/5.0(X11;Linux x86_64)AppleWebKit/537.36(KHTML,如Gecko)Ubuntu Chromium/78.0.3904.108 Chrome/78.0.3904.108 Safari/537.36”
})
def解析(自我,响应):
对于response.xpath('//div[@class=“span8网格样式马赛克”]/div/div[@class=“span2产品项马赛克”]”)中的产品:
屈服{
'product_name':product.xpath('.//div[@class=“name”]/a/text()|//div[@class=“name”]/a/span/@data-original-title')。get(),
'product_brand':product.xpath('.//div[@class=“brand”]/text()).get(),
'product_url':response.urljoin(product.xpath('.//div[@class=“name”]/a/@href').get()),
'product_original':product.xpath('.//div[@class=“cash price”]/text()).get(),
“产品折扣”:product.xpath('.//span[@class=“cash previous price value”]/text()).get()
}
next_page=response.urljoin(response.xpath('//a[@class=“next”]/@href').get())
如果下一页:
产生scrapy.Request(url=next_page,callback=self.parse,headers={
“用户代理”:“Mozilla/5.0(X11;Linux x86_64)AppleWebKit/537.36(KHTML,如Gecko)Ubuntu Chromium/78.0.3904.108 Chrome/78.0.3904.108 Safari/537.36”
})

Scrapy是单线程的,因此不支持多线程。Scrapy在其基础上异步执行请求。为了加快爬网过程,您可以在
setting.py
中增加并发请求,方法是修改
concurrent_requests
concurrent_requests_PER_DOMAIN
,默认数字为16和8。多读一些有建设性的文章