Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scrapy python-我一直在抓取0页_Python_Python 3.x_Web Scraping_Scrapy - Fatal编程技术网

Scrapy python-我一直在抓取0页

Scrapy python-我一直在抓取0页,python,python-3.x,web-scraping,scrapy,Python,Python 3.x,Web Scraping,Scrapy,我尝试了多个教程,但无论我尝试什么,我总是得到相同的结果“爬网0页(以0页/分钟的速度),刮取0项(以0项/分钟的速度)” 我的代码非常简单: import scrapy class SpiderSpider(scrapy.Spider): name = 'spider' allowed_domains = ['books.toscrape.com/'] start_urls = ['http://books.toscrape.com//'] def pars

我尝试了多个教程,但无论我尝试什么,我总是得到相同的结果“爬网0页(以0页/分钟的速度),刮取0项(以0项/分钟的速度)”

我的代码非常简单:

import scrapy

class SpiderSpider(scrapy.Spider):
    name = 'spider'
    allowed_domains = ['books.toscrape.com/']
    start_urls = ['http://books.toscrape.com//']

    def parse(self, response):
        print(response.url)
输出为:

2020-11-03 22:11:52[scrapy.utils.log]信息:scrapy 2.4.0已启动 (bot:books)2020-11-03 22:11:52[scrapy.utils.log]信息:版本: lxml4.5.2.0,libxml2.9.10,cssselect 1.1.0,parsel 1.6.0,w3lib 1.22.0、Twisted 20.3.0、Python 3.8.3(默认值,2020年7月2日,11:26:31)-[Clang 10.0.0]、pyOpenSSL 19.1.0(OpenSSL 1.1.1g 2020年4月21日)、密码学2.9.2、平台macOS-10.15.7-x86_64-i386-64位 2020-11-03 22:11:52[scrapy.utils.log]调试:使用反应堆: twisted.internet.selectreactor.selectreactor 2020-11-03 22:11:52 [scrapy.crawler]信息:覆盖的设置:{'BOT_NAME':'books', “NEWSPIDER_模块”:“books.spider”,“ROBOTSTXT_-obe”:正确, “蜘蛛模块”:[books.SPIDER]}2020-11-03 22:11:52 [scrapy.extensions.telnet]信息:telnet密码:ae1669f089ac9e66 2020-11-03 22:11:52[scrapy.middleware]信息:启用的扩展: ['scrapy.extensions.corestats.corestats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.logstats']2020-11-03 22:11:52 [scrapy.middleware]信息:启用的下载程序中间件: ['scrapy.downloaderMiddleware.robotstxt.RobotsTxtMiddleware', 'scrapy.downloaderMiddleware.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloaderMiddleware.defaultheaders.DefaultHeadersMiddleware', 'scrapy.DownloaderMiddleware.useragent.UserAgentMiddleware', 'scrapy.DownloaderMiddleware.retry.RetryMiddleware', 'scrapy.DownloaderMiddleware.redirect.MetaRefreshMiddleware', 'scrapy.DownloaderMiddleware.httpcompression.HttpCompressionMiddleware', 'scrapy.DownloaderMiddleware.redirect.RedirectMiddleware', “scrapy.DownloaderMiddleware.cookies.CookiesMiddleware”, 'scrapy.downloadermiddleware.httpproxy.HttpProxyMiddleware', “scrapy.DownloaderMiddleware.stats.DownloaderStats”]2020-11-03 22:11:52[scrapy.middleware]信息:启用的蜘蛛中间件: ['scrapy.spidermiddleware.httperror.httperror中间件', '刮皮.SpiderMiddleware.场外.场外Iddleware', “scrapy.Spidermiddleware.referer.RefererMiddleware”, 'scrapy.spiderMiddleware.urllength.UrlLengthMiddleware', “刮屑.蜘蛛丝.深度.深度蜘蛛丝”]2020-11-03 22:11:52 [scrapy.middleware]信息:启用的项目管道:[]2020-11-03 22:11:52[刮屑.堆芯.发动机]信息:蜘蛛网打开2020-11-03 22:11:52 [scrapy.extensions.logstats]信息:已爬网0页(0页/分钟), 刮取0个项目(以0个项目/分钟的速度)2020-11-03 22:11:52 [scrapy.extensions.telnet]信息:telnet控制台正在侦听
127.0.0.1:6023 2020-11-03 22:11:53[scrapy.core.engine]调试:Crawled(404)看起来你正在抓取的站点上没有robots.txt


您可以通过转到scrapy的settings.py并找到ROBOTSTXT_obe来禁用robots.txt。将此设置为false。

您的输出显示您已爬网两个页面:

http://books.toscrape.com/robots.txt (HTTP status 404 error)
http://books.toscrape.com// (HTTP status 200)

看起来一切都正常(除了我没有看到你在outout中打印声明)。

我尝试了这个方法,但没有解决问题。。。仍然得到同样的东西谢谢你,我希望爬网说我至少爬过了1页,所以当我看到输出说它是0时,我只是假设它没有。