通过python脚本调用spider时JSON在scrapy中不工作?
当我通过python脚本调用spider时,如下所示:通过python脚本调用spider时JSON在scrapy中不工作?,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,当我通过python脚本调用spider时,如下所示: import os os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings') from twisted.internet import reactor from scrapy import log, signals from scrapy.crawler import Crawler from scrapy.settings import CrawlerSetti
import os
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings')
from twisted.internet import reactor
from scrapy import log, signals
from scrapy.crawler import Crawler
from scrapy.settings import CrawlerSettings
from scrapy.xlib.pydispatch import dispatcher
from spiders.image import aqaqspider
def stop_reactor():
reactor.stop()
dispatcher.connect(stop_reactor, signal=signals.spider_closed)
spider = aqaqspider(domain='aqaq.com')
crawler = Crawler(CrawlerSettings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
log.msg('Running reactor...')
reactor.run() # the script will block here until the spider is closed
log.msg('Reactor stopped.')
未创建我的Json文件。My pipelines.py具有以下代码:
import json
import codecs
class JsonWithEncodingPipeline(object):
def __init__(self):
self.file = codecs.open('scraped_data_utf8.json', 'w', encoding='utf-8')
def process_item(self, item, spider):
line = json.dumps(dict(item), ensure_ascii=False) + "\n"
self.file.write(line)
return item
def spider_closed(self, spider):
self.file.close()
当我用简单的命令行调用spider作为scrapy crawl时,它运行良好,即正在创建JSON文件
请帮帮我。我是新来的刮痧
谢谢大家!!
我已经找到了解决方案……我也遇到了同样的问题,但没有得到任何帮助。你知道怎么做吗?没有。我猜scrapy本身也有问题。我从scrapy得到了我的python脚本是这样的。爬虫导入爬虫从多处理导入进程从AQAQAQ.spider.image导入AQAQAQSPIDER def HandleSpider:reactor.stop mySettings={'LOG_ENABLED':True,'ITEM_PIPELINES':['aqaq.PIPELINES.JsonWithEncodingPipeline','scrapy.contrib.pipeline.images.images']}全局设置settings.overrides.updatemySettings crawlerProcess=CrawlerProcesssettings crawlerProcess.install crawlerProcess.configure spider=nameofspider创建一个spider