Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/docker/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy spider未在docker容器中执行close方法_Python_Docker_Flask_Scrapy - Fatal编程技术网

Python Scrapy spider未在docker容器中执行close方法

Python Scrapy spider未在docker容器中执行close方法,python,docker,flask,scrapy,Python,Docker,Flask,Scrapy,我有一个烧瓶应用程序,它将运行一个刮擦蜘蛛。 该应用程序在我的开发机器中运行良好,但是当我在容器中运行它时,不会执行spider的close方法 以下是爬行器的代码: # -*- coding: utf-8 -*- import scrapy from bs4 import BeautifulSoup from scrapy.exceptions import CloseSpider class ToScrapeCSSSpider(scrapy.Spider): name = &qu

我有一个烧瓶应用程序,它将运行一个刮擦蜘蛛。 该应用程序在我的开发机器中运行良好,但是当我在容器中运行它时,不会执行spider的close方法

以下是爬行器的代码:

# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
from scrapy.exceptions import CloseSpider


class ToScrapeCSSSpider(scrapy.Spider):
    name = "toscrape-css"
    start_urls = [
        'http://quotes.toscrape.com/',
    ]

    def parse(self, response):
        page_text = response.text
        # raise CloseSpider("Blocked")

        soup = BeautifulSoup(page_text, "lxml")
        if "xml" in str.lower(page_text[:20]):
            sitemap = True
            links = soup.findAll("loc")
            for link in links:
                yield scrapy.Request(url=link.text, callback=self.parse)

        else:
            raise CloseSpider("I want to close it")
    def close(spider, reason):
        print("Closing spider")
        # self.pbar.clear()
        # self.pbar.write('Closing {} spider'.format(spider.name))
        print("Spider closed")
这是main.py中的我的烧瓶应用程序:

import crochet
crochet.setup()     # initialize crochet

import json
import pandas as pd
from flask import  redirect, url_for, request
from scrapy.crawler import CrawlerRunner, CrawlerProcess
import time
from datetime import datetime, timedelta
import grequests
from flask import render_template, jsonify, Flask, redirect, url_for, request, flash
from app2.articles_finder.spiders.test_spider import ToScrapeCSSSpider
from app2 import app2



@app2.route("/test_docker")
def test_docker():
    scrap_docker()
    return  "Ok",200
@crochet.run_in_reactor
def scrap_docker():
    eventual = crawl_runner.crawl(ToScrapeCSSSpider)
    eventual.addCallback(finished_docker)

def finished_docker(null):
    print("Scrapping is over in docker container")
最后,她是我的docker文件:

FROM phusion/baseimage:0.9.19

# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]

ENV TERM=xterm-256color
ENV SCRAPPER_HOME=/app/links_finder
ENV PYTHON_VERSION="3.6.5"
ENV FRONT_ADDRESS = blabla



# Set the locale
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8


# Install necessary packages

RUN apt-get update && apt-get install -y \
    build-essential
#RUN apt-get update && apt-get install -y \
#    build-essential \


# Install core packages
#RUN apt-get update
RUN apt-get install -y build-essential checkinstall software-properties-common llvm cmake wget git nano nasm yasm zip unzip pkg-config \
    libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

# Install Python 3.6.5
RUN wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz \
    && tar xvf Python-${PYTHON_VERSION}.tar.xz \
    && rm Python-${PYTHON_VERSION}.tar.xz \
    && cd Python-${PYTHON_VERSION} \
    && ./configure \
    && make altinstall \
    && cd / \
    && rm -rf Python-${PYTHON_VERSION}

RUN apt-get install -y python3-pip

WORKDIR ${SCRAPPER_HOME}
COPY . ${SCRAPPER_HOME}
RUN ls

#COPY  run_gunicorn_app_2.py ${SCRAPPER_HOME}


RUN pip3 install -r requirements2.txt



RUN chmod 777 -R *


# Clean up
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py

EXPOSE 3456

ENTRYPOINT python3 run_gunicorn_app_2.py
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py
requirements2.txt文件:

tqdm==4.19.4
APScheduler ==3.6.1
Flask==1.0.2
Flask-Admin==1.3.0
Flask-Bcrypt==0.7.1
Flask-DebugToolbar==0.10.0
Flask-Login==0.3.2
Flask-Mail==0.9.1
Flask-Script==2.0.5
Flask-SQLAlchemy==2.1
Flask-WTF==0.12
Flask-redis==0.4.0
gunicorn==19.4.5
itsdangerous==0.24
pytz==2016.10
structlog==16.1.0
termcolor==1.1.0
WTForms==2.1
scrapy==1.6.0
grequests==0.4.0
#pandas==0.24
crochet==1.10.0
redis==3.3.8
beautifulsoup4==4.7.1
publicsuffixlist==0.7.1
PyMySQL==0.9.3
当我运行docker容器时,我得到的是:

显然:close方法根本不执行。
有什么提示吗?我已经被这个问题困扰了很长一段时间,所以任何结论都是非常受欢迎的。谢谢大家!

经过大量调试,最终似乎没有问题。 我只需要在python3之后添加-u来添加日志记录

ENTRYPOINT python3 -u run_gunicorn_app_2.py