Python Scrapy没有在docker中运行_Python_Docker_Scrapy

Python Scrapy没有在docker中运行

python docker scrapy

Python Scrapy没有在docker中运行,python,docker,scrapy,Python,Docker,Scrapy,我正试图在docker容器中运行我的scrapy脚本main.py。脚本按顺序运行3个爬行器，并将它们的刮取项写入本地数据库。以下是main.py的源代码： from twisted.internet import reactor, defer from scrapy.crawler import CrawlerRunner from scrapy.utils.log import configure_logging from scrapy.settings import Settings

我正试图在docker容器中运行我的scrapy脚本

main.py

。脚本按顺序运行3个爬行器，并将它们的刮取项写入本地数据库。以下是

main.py

的源代码：

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings

from spiders.puntorigenera_spider import PuntorigeneraSpider
from spiders.lamiastampante_spider import LamiastampanteSpider
from spiders.printer_spider import PrinterSpider 

configure_logging()
crawler_settings = get_project_settings()
runner = CrawlerRunner(settings=crawler_settings)

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(PrinterSpider)
    yield runner.crawl(LamiastampanteSpider)
    yield runner.crawl(PuntorigeneraSpider)
    reactor.stop()

if __name__ == "__main__":
    crawl()
    reactor.run()

DB_SETTINGS = {
    'db': "COMPATIBILITA_PRODOTTI_SCHEMA_2",
    'user': 'root',
    'passwd': '',
    'host': 'localhost',
    'port': 3306
}

这些是

settings.py

中指定的DB设置：

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings

from spiders.puntorigenera_spider import PuntorigeneraSpider
from spiders.lamiastampante_spider import LamiastampanteSpider
from spiders.printer_spider import PrinterSpider 

configure_logging()
crawler_settings = get_project_settings()
runner = CrawlerRunner(settings=crawler_settings)

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(PrinterSpider)
    yield runner.crawl(LamiastampanteSpider)
    yield runner.crawl(PuntorigeneraSpider)
    reactor.stop()

if __name__ == "__main__":
    crawl()
    reactor.run()

DB_SETTINGS = {
    'db': "COMPATIBILITA_PRODOTTI_SCHEMA_2",
    'user': 'root',
    'passwd': '',
    'host': 'localhost',
    'port': 3306
}

这是我的

Dockerfile

：

# As Scrapy runs on Python, I choose the official Python 3 Docker image.
FROM python:3.7.3-stretch
 
# Set the working directory to /usr/src/app.
WORKDIR /scraper/src/docker
 
# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./
 
# Install Scrapy specified in requirements.txt.
RUN pip3 install --no-cache-dir -r requirements.txt
 
# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .
 
# Run the crawler when the container launches.
CMD [ "python3", "./scraper/scraper/main.py" ]

我的项目结构如下：

proj|
    |−scraper|
    |        |−scraper|
    |                 |−spiders|
    |                 |        |− ...
    |                 |        |− ...
    |                 |− main.py 
    |                 |− ...
    |− Dockerfile
    |− requirements.txt

问题

当我运行

python main.py

时，它工作得很好。我可以看到刮板在终端中运行，DB成功填充。但是，当我使用命令

docker build-t mycrawler.

构建docker映像，并使用命令

docker run--network=host mycrawler运行它时，我只能在此处看到此输出：
2020-11-08 13:13:48 [scrapy.crawler] INFO: Overridden settings:
{}
2020-11-08 13:13:48 [scrapy.extensions.telnet] INFO: Telnet Password: 01b06b3e6f172d1d
2020-11-08 13:13:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']

。。。它永远都是这样，当然，甚至没有向DB写任何东西
我对Docker真的很陌生。我是否在dockerfile中或在运行构建映像的方式中丢失了某些内容？
'host'：'localhost'，
应该是可配置的，例如'host'：os.getenv（'MYSQL\u host'，'localhost'），
，因为在docker中，很像在虚拟机中，“localhost”意味着容器，而不是您的开发机器或docker机器运行的虚拟机（尽管我不能马上说为什么它会挂起，而不仅仅是出错；也许在您的设置中有更多打印。py
可以帮助跟踪它挂起的位置）我尝试了，但没有帮助。感谢您解释本地主机是VM中的主机。我试图在我的settings.py中打印一些内容，但它没有到达print语句。