Python Scrapy没有在docker中运行
我正试图在docker容器中运行我的scrapy脚本Python Scrapy没有在docker中运行,python,docker,scrapy,Python,Docker,Scrapy,我正试图在docker容器中运行我的scrapy脚本main.py。 脚本按顺序运行3个爬行器,并将它们的刮取项写入本地数据库。 以下是main.py的源代码: from twisted.internet import reactor, defer from scrapy.crawler import CrawlerRunner from scrapy.utils.log import configure_logging from scrapy.settings import Settings
main.py
。
脚本按顺序运行3个爬行器,并将它们的刮取项写入本地数据库。
以下是main.py
的源代码:
from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from spiders.puntorigenera_spider import PuntorigeneraSpider
from spiders.lamiastampante_spider import LamiastampanteSpider
from spiders.printer_spider import PrinterSpider
configure_logging()
crawler_settings = get_project_settings()
runner = CrawlerRunner(settings=crawler_settings)
@defer.inlineCallbacks
def crawl():
yield runner.crawl(PrinterSpider)
yield runner.crawl(LamiastampanteSpider)
yield runner.crawl(PuntorigeneraSpider)
reactor.stop()
if __name__ == "__main__":
crawl()
reactor.run()
DB_SETTINGS = {
'db': "COMPATIBILITA_PRODOTTI_SCHEMA_2",
'user': 'root',
'passwd': '',
'host': 'localhost',
'port': 3306
}
这些是settings.py
中指定的DB设置:
from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from spiders.puntorigenera_spider import PuntorigeneraSpider
from spiders.lamiastampante_spider import LamiastampanteSpider
from spiders.printer_spider import PrinterSpider
configure_logging()
crawler_settings = get_project_settings()
runner = CrawlerRunner(settings=crawler_settings)
@defer.inlineCallbacks
def crawl():
yield runner.crawl(PrinterSpider)
yield runner.crawl(LamiastampanteSpider)
yield runner.crawl(PuntorigeneraSpider)
reactor.stop()
if __name__ == "__main__":
crawl()
reactor.run()
DB_SETTINGS = {
'db': "COMPATIBILITA_PRODOTTI_SCHEMA_2",
'user': 'root',
'passwd': '',
'host': 'localhost',
'port': 3306
}
这是我的Dockerfile
:
# As Scrapy runs on Python, I choose the official Python 3 Docker image.
FROM python:3.7.3-stretch
# Set the working directory to /usr/src/app.
WORKDIR /scraper/src/docker
# Copy the file from the local host to the filesystem of the container at the working directory.
COPY requirements.txt ./
# Install Scrapy specified in requirements.txt.
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy the project source code from the local host to the filesystem of the container at the working directory.
COPY . .
# Run the crawler when the container launches.
CMD [ "python3", "./scraper/scraper/main.py" ]
我的项目结构如下:
proj|
|−scraper|
| |−scraper|
| |−spiders|
| | |− ...
| | |− ...
| |− main.py
| |− ...
|− Dockerfile
|− requirements.txt
问题
当我运行python main.py
时,它工作得很好。我可以看到刮板在终端中运行,DB成功填充。
但是,当我使用命令docker build-t mycrawler.
构建docker映像,并使用命令docker run--network=host mycrawler运行它时,我只能在此处看到此输出:
2020-11-08 13:13:48 [scrapy.crawler] INFO: Overridden settings:
{}
2020-11-08 13:13:48 [scrapy.extensions.telnet] INFO: Telnet Password: 01b06b3e6f172d1d
2020-11-08 13:13:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
。。。它永远都是这样,当然,甚至没有向DB写任何东西
我对Docker真的很陌生。我是否在dockerfile中或在运行构建映像的方式中丢失了某些内容?'host':'localhost',
应该是可配置的,例如'host':os.getenv('MYSQL\u host','localhost'),
,因为在docker中,很像在虚拟机中,“localhost”意味着容器,而不是您的开发机器或docker机器运行的虚拟机(尽管我不能马上说为什么它会挂起,而不仅仅是出错;也许在您的设置中有更多打印。py
可以帮助跟踪它挂起的位置)我尝试了,但没有帮助。感谢您解释本地主机是VM中的主机。我试图在我的settings.py中打印一些内容,但它没有到达print语句。