在Scrapy python中将参数传递给process.crawl

在Scrapy python中将参数传递给process.crawl,python,web-crawler,scrapy,scrapy-spider,google-crawlers,Python,Web Crawler,Scrapy,Scrapy Spider,Google Crawlers,我希望得到与此命令行相同的结果: scrapy crawl linkedin_anonymous-a first=James-a last=Bond-o output.json 我的脚本如下: import scrapy from linkedin_anonymous_spider import LinkedInAnonymousSpider from scrapy.crawler import CrawlerProcess from scrapy.utils.project import ge

我希望得到与此命令行相同的结果: scrapy crawl linkedin_anonymous-a first=James-a last=Bond-o output.json

我的脚本如下:

import scrapy
from linkedin_anonymous_spider import LinkedInAnonymousSpider
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

spider = LinkedInAnonymousSpider(None, "James", "Bond")
process = CrawlerProcess(get_project_settings())
process.crawl(spider) ## <-------------- (1)
process.start()
import scrapy
从linkedin_anonymous_spider导入linkedin AnonymousSpider
从scrapy.crawler导入crawler进程
从scrapy.utils.project导入获取项目设置
蜘蛛=链接匿名蜘蛛(无,“詹姆斯”,“邦德”)
进程=爬网进程(获取项目设置()

process.crawl(spider)##在
process.crawl
方法上传递spider参数:

process.crawl(spider, input='inputargument', first='James', last='Bond')

您可以用简单的方法完成:

from scrapy import cmdline

cmdline.execute("scrapy crawl linkedin_anonymous -a first=James -a last=Bond -o output.json".split())

如果您有Scrapyd,并且希望安排爬行器,请执行此操作


curlhttp://localhost:6800/schedule.json -d project=projectname-d spider=spidername-d first='James'-d last='Bond'

但这样我们可能无法传递
-o output.json
?@hAcKnRoCk下面是如何配置输出文件
from scrapy import cmdline

cmdline.execute("scrapy crawl linkedin_anonymous -a first=James -a last=Bond -o output.json".split())