Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/fortran/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
django服务器运行时,每n分钟在后台运行一次spider_Django_Scrapy - Fatal编程技术网

django服务器运行时,每n分钟在后台运行一次spider

django服务器运行时,每n分钟在后台运行一次spider,django,scrapy,Django,Scrapy,我有一个django项目。在这个项目中,有一些爬虫从一些网站抓取数据并将其存储在数据库中。使用django,将显示这些已爬网的数据 这是项目的结构: -prj db.sqlite3 manage.py -prj __init__.py settings.py urls.py wsgi.py -prj_app __init__.py prj_spider.py admin.py

我有一个django项目。在这个项目中,有一些爬虫从一些网站抓取数据并将其存储在数据库中。使用django,将显示这些已爬网的数据

这是项目的结构:

-prj
   db.sqlite3
   manage.py
   -prj
       __init__.py
       settings.py
       urls.py
       wsgi.py
   -prj_app
       __init__.py
       prj_spider.py
       admin.py
       apps.py
       models.py
       runner.py
       urls.py
       views.py
我想在django服务器运行时,每5分钟在后台运行一次所有spider。在
views.py
中,我导入了
runner.py
,在
runner.py
中,所有爬行器都开始爬行

views.py:

from . import runner
runner.py:

from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from multiprocessing import Process, Queue
from .prj_spider import PrjSpider
from background_task import background

@background()
def run_spider(spider):
    def f(q):
        try:
            configure_logging()
            runner = CrawlerRunner()
            deferred = runner.crawl(spider)
            deferred.addBoth(lambda _: reactor.stop())
            reactor.run()
            q.put(None)
        except Exception as e:
            q.put(e)

    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    result = q.get()
    p.join()

    if result is not None:
        raise result

for spider in spiders:
    run_spider(DivarSpider, repeat=60)
@background()
def fetch_data():
    runner = CrawlerRunner()
    runner.crawl(PrjSpider)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())
    reactor.run()

fetch_data(repeat=60)
运行服务器时,出现以下错误:

TypeError:类型为的对象不可JSON序列化

同样使用这种类型的
runner.py
,我得到以下错误:

runner.py:

from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from multiprocessing import Process, Queue
from .prj_spider import PrjSpider
from background_task import background

@background()
def run_spider(spider):
    def f(q):
        try:
            configure_logging()
            runner = CrawlerRunner()
            deferred = runner.crawl(spider)
            deferred.addBoth(lambda _: reactor.stop())
            reactor.run()
            q.put(None)
        except Exception as e:
            q.put(e)

    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    result = q.get()
    p.join()

    if result is not None:
        raise result

for spider in spiders:
    run_spider(DivarSpider, repeat=60)
@background()
def fetch_data():
    runner = CrawlerRunner()
    runner.crawl(PrjSpider)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())
    reactor.run()

fetch_data(repeat=60)
错误:

引发错误.ReactorNotRestartable()twisted.internet.error.ReactorNotRestartable


你以前有没有用反应堆一起启动过蜘蛛?我的意思是这个错误是不是突然出现了?因为我试着用反应堆运行所有蜘蛛,但运行不顺利。@MuratDemir是的,我可以在后台一起启动所有蜘蛛一次。。。但我不知道该怎么安排