Python 如何在Jupyter运行Scrapy项目？_Python_Scrapy_Jupyter

Python 如何在Jupyter运行Scrapy项目？

python scrapy

Python 如何在Jupyter运行Scrapy项目？,python,scrapy,jupyter,Python,Scrapy,Jupyter,在Mac电脑上，我安装了Jupyter，当我从我的Scrapy项目的根文件夹中键入Jupyter notebook时，它会打开笔记本。此时，我可以浏览所有项目文件如何从笔记本中执行项目如果单击“终端”下的“运行”选项卡，我会看到： There are no terminals running. 实现这一目标的主要方法有两种：一,。在“文件”选项卡下打开一个新终端：新建>终端然后只需运行爬行器：scrapy crawl[选项] 二,。创建新笔记本并使用CrawlerProcess或C

在Mac电脑上，我安装了Jupyter，当我从我的Scrapy项目的根文件夹中键入

Jupyter notebook

时，它会打开笔记本。此时，我可以浏览所有项目文件

如何从笔记本中执行项目

如果单击“终端”下的“运行”选项卡，我会看到：

There are no terminals running.

实现这一目标的主要方法有两种：

一,。在“文件”选项卡下打开一个新终端：新建>终端
然后只需运行爬行器：

scrapy crawl[选项]

二,。创建新笔记本并使用

CrawlerProcess

或

CrawlerRunner

类在单元格中运行：

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl('your-spider')
process.start() # the script will block here until the crawling is finished

Jupyter有一个快捷方式，可以从单元格本身运行命令行参数。用

启动单元格并像通常在控制台中一样键入命令的其余部分
运行Spyder类无需终端。只需在您的jupyter笔记本
单元格中添加以下代码：
import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

有关更多信息，请参阅