当我尝试启动spider时,获取和验证Scrapyd api
我对ScrapydAPI有意见 我写了一个简单的spider,它获取域url作为参数当我尝试启动spider时,获取和验证Scrapyd api,scrapy,scrapyd,Scrapy,Scrapyd,我对ScrapydAPI有意见 我写了一个简单的spider,它获取域url作为参数 import scrapy class QuotesSpider(scrapy.Spider): name = 'quotes' def __init__(self, domains=None): self.allowed_domains = [domains] self.start_urls = ['http://{}/'.format(domains)]
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
def __init__(self, domains=None):
self.allowed_domains = [domains]
self.start_urls = ['http://{}/'.format(domains)]
def parse(self, response):
# time.sleep(int(self.sleep))
item = {}
item['title'] = response.xpath('//head/title/text()').extract()
yield item
如果我像这样运行它,它工作得非常完美
scrapy crawl quotes -a domains=quotes.toscrape.com
但当通过scrapyd_api运行它时,它会出错:
from scrapyd_api import ScrapydAPI
scrapyd = ScrapydAPI('http://localhost:6800')
scrapyd.schedule(project='pd', spider='quotes', domains='http://quotes.toscrape.com/')
I get-builtins.TypeError:init()得到一个意外的关键字参数“\u job”
如何通过带有args的scrapyd api启动scrapy Spider?这是一个答案
据我所知,超级方法是错误的
现在,我的代码如下所示:
class QuotesSpider(scrapy.Spider):
name = 'quotes'
start_urls = []
def __init__(self, *args, **kwargs):
super(QuotesSpider, self).__init__(*args, **kwargs)
self.allowed_domains = [kwargs.get('domains')]
self.start_urls.append('http://{}/'.format(kwargs.get('domains')))