Python Scrapy没有在docker中运行

Python Scrapy没有在docker中运行,python,docker,scrapy,Python,Docker,Scrapy,我正试图在docker容器中运行我的scrapy脚本main.py。 脚本按顺序运行3个爬行器,并将它们的刮取项写入本地数据库。 以下是main.py的源代码: from twisted.internet import reactor, defer from scrapy.crawler import CrawlerRunner from scrapy.utils.log import configure_logging from scrapy.settings import Settings

我正试图在docker容器中运行我的scrapy脚本
main.py
。 脚本按顺序运行3个爬行器,并将它们的刮取项写入本地数据库。 以下是
main.py
的源代码:

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings

from spiders.puntorigenera_spider import PuntorigeneraSpider
from spiders.lamiastampante_spider import LamiastampanteSpider
from spiders.printer_spider import PrinterSpider 

configure_logging()
crawler_settings = get_project_settings()
runner = CrawlerRunner(settings=crawler_settings)

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(PrinterSpider)
    yield runner.crawl(LamiastampanteSpider)
    yield runner.crawl(PuntorigeneraSpider)
    reactor.stop()

if __name__ == "__main__":
    crawl()
    reactor.run()
DB_SETTINGS = {
    'db': "COMPATIBILITA_PRODOTTI_SCHEMA_2",
    'user': 'root',
    'passwd': '',
    'host': 'localhost',
    'port': 3306
}
这些是
settings.py
中指定的DB设置:

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings

from spiders.puntorigenera_spider import PuntorigeneraSpider
from spiders.lamiastampante_spider import LamiastampanteSpider
from spiders.printer_spider import PrinterSpider 

configure_logging()
crawler_settings = get_project_settings()
runner = CrawlerRunner(settings=crawler_settings)

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(PrinterSpider)
    yield runner.crawl(LamiastampanteSpider)
    yield runner.crawl(PuntorigeneraSpider)
    reactor.stop()

if __name__ == "__main__":
    crawl()
    reactor.run()
DB_SETTINGS = {
    'db': "COMPATIBILITA_PRODOTTI_SCHEMA_2",
    'user': 'root',
    'passwd': '',
    'host': 'localhost',
    'port': 3306
}
这是我的
Dockerfile

# As Scrapy runs on Python, I choose the official Python 3 Docker image.
FROM python:3.7.3-stretch
 
# Set the working directory to /usr/src/app.
WORKDIR /scraper/src/docker
 
# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./
 
# Install Scrapy specified in requirements.txt.
RUN pip3 install --no-cache-dir -r requirements.txt
 
# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .
 
# Run the crawler when the container launches.
CMD [ "python3", "./scraper/scraper/main.py" ]
我的项目结构如下:

proj|
    |−scraper|
    |        |−scraper|
    |                 |−spiders|
    |                 |        |− ...
    |                 |        |− ...
    |                 |− main.py 
    |                 |− ...
    |− Dockerfile
    |− requirements.txt
问题

当我运行
python main.py
时,它工作得很好。我可以看到刮板在终端中运行,DB成功填充。 但是,当我使用命令
docker build-t mycrawler.
构建docker映像,并使用命令
docker run--network=host mycrawler运行它时,我只能在此处看到此输出:

2020-11-08 13:13:48 [scrapy.crawler] INFO: Overridden settings:
{}
2020-11-08 13:13:48 [scrapy.extensions.telnet] INFO: Telnet Password: 01b06b3e6f172d1d
2020-11-08 13:13:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
。。。它永远都是这样,当然,甚至没有向DB写任何东西


我对Docker真的很陌生。我是否在dockerfile中或在运行构建映像的方式中丢失了某些内容?

'host':'localhost',
应该是可配置的,例如
'host':os.getenv('MYSQL\u host','localhost'),
,因为在docker中,很像在虚拟机中,“localhost”意味着容器,而不是您的开发机器或docker机器运行的虚拟机(尽管我不能马上说为什么它会挂起,而不仅仅是出错;也许在您的
设置中有更多
打印
。py
可以帮助跟踪它挂起的位置)我尝试了,但没有帮助。感谢您解释本地主机是VM中的主机。我试图在我的settings.py中打印一些内容,但它没有到达print语句。