Python 我怎样才能把这两个蜘蛛合并成一个呢?
有两个spider使用相同的资源文件和几乎相同的结构 spiderA包含:Python 我怎样才能把这两个蜘蛛合并成一个呢?,python,scrapy,Python,Scrapy,有两个spider使用相同的资源文件和几乎相同的结构 spiderA包含: import scrapy import pkgutil class StockSpider(scrapy.Spider): name = "spiderA" data = pkgutil.get_data("tutorial", "resources/webs.txt") data = data.decode() urls = data.split("\r\n") start_
import scrapy
import pkgutil
class StockSpider(scrapy.Spider):
name = "spiderA"
data = pkgutil.get_data("tutorial", "resources/webs.txt")
data = data.decode()
urls = data.split("\r\n")
start_urls = [url + "string1" for url in urls]
def parse(self, response):
pass
import scrapy
import pkgutil
class StockSpider(scrapy.Spider):
name = "spiderB"
data = pkgutil.get_data("tutorial", "resources/webs.txt")
data = data.decode()
urls = data.split("\r\n")
start_urls = [url + "string2" for url in urls]
def parse(self, response):
pass
spiderB包含:
import scrapy
import pkgutil
class StockSpider(scrapy.Spider):
name = "spiderA"
data = pkgutil.get_data("tutorial", "resources/webs.txt")
data = data.decode()
urls = data.split("\r\n")
start_urls = [url + "string1" for url in urls]
def parse(self, response):
pass
import scrapy
import pkgutil
class StockSpider(scrapy.Spider):
name = "spiderB"
data = pkgutil.get_data("tutorial", "resources/webs.txt")
data = data.decode()
urls = data.split("\r\n")
start_urls = [url + "string2" for url in urls]
def parse(self, response):
pass
我如何组合spiderA和spiderB,并添加一个开关变量,让
crapy scral
根据需要调用不同的spider 尝试为spider类型添加单独的参数。您可以通过调用scrapy crawl myspider-a spider_type=second
进行设置。检查以下代码示例:
import scrapy
import pkgutil
class StockSpider(scrapy.Spider):
name = "myspider"
def start_requests(self):
if not hasattr(self, 'spider_type'):
self.logger.error('No spider_type specified')
return
data = pkgutil.get_data("tutorial", "resources/webs.txt")
data = data.decode()
for url in data.split("\r\n"):
if self.spider_type == 'first':
url += 'first'
if self.spider_type == 'second':
url += 'second'
yield scrapy.Request(url)
def parse(self, response):
pass
此外,您还可以始终创建基干类,然后从中继承,只重载一个变量(添加到url中)和名称(用于单独调用)。
spider\u type
会导致错误
NameError: name 'spider_type' is not defined.
它是spider类中的self.spider_类型
import scrapy
import pkgutil
class StockSpider(scrapy.Spider):
name = "myspider"
def start_requests(self):
if not hasattr(self, 'spider_type'):
self.logger.error('No spider_type specified')
return
data = pkgutil.get_data("tutorial", "resources/webs.txt")
data = data.decode()
for url in data.split("\r\n"):
if self.spider_type == 'first':
url += 'first'
if self.spider_type == 'second':
url += 'second'
yield scrapy.Request(url)
def parse(self, response):
pass
使之更严格和准确
scrapy crawl myspider -a spider_type='second'