Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 刮擦爬行误差_Python 3.x_Scrapy_Pycharm - Fatal编程技术网

Python 3.x 刮擦爬行误差

Python 3.x 刮擦爬行误差,python-3.x,scrapy,pycharm,Python 3.x,Scrapy,Pycharm,我不熟悉python和scrapy。我按照一个教程做了一个scrapy crawl quotes.toscrape.com 我输入的代码与教程中的代码一模一样,但当我运行scrapy crawl quotes时,我不断得到一个ValueError:invalid hostname:错误。我在Pycharm电脑上做这件事 我尝试在start\u URL=[]部分中对URL使用单引号和双引号,但这并没有修复错误 这是代码的样子: import scrapy class QuoteSpider(sc

我不熟悉python和scrapy。我按照一个教程做了一个scrapy crawl quotes.toscrape.com

我输入的代码与教程中的代码一模一样,但当我运行scrapy crawl quotes时,我不断得到一个
ValueError:invalid hostname
:错误。我在
Pycharm
电脑上做这件事

我尝试在
start\u URL=[]
部分中对
URL
使用单引号和双引号,但这并没有修复错误

这是代码的样子:

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http: // quotes.toscrape.com /'
    ]

    def parse(self, response):
        title = response.css('title').extract()
        yield {'titletext':title}
2019-11-08 12:52:42 [scrapy.core.engine] INFO: Spider opened
2019-11-08 12:52:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-11-08 12:52:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-11-08 12:52:42 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http:///robots.txt>: invalid hostname: 
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.scraper] ERROR: Error downloading <GET http:///%20//%20quotes.toscrape.com%20/>
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.engine] INFO: Closing spider (finished)
它应该是为了标题而删除网站

这就是错误的样子:

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http: // quotes.toscrape.com /'
    ]

    def parse(self, response):
        title = response.css('title').extract()
        yield {'titletext':title}
2019-11-08 12:52:42 [scrapy.core.engine] INFO: Spider opened
2019-11-08 12:52:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-11-08 12:52:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-11-08 12:52:42 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http:///robots.txt>: invalid hostname: 
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.scraper] ERROR: Error downloading <GET http:///%20//%20quotes.toscrape.com%20/>
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    defer.returnValue((yield download_func(request=request, spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.engine] INFO: Closing spider (finished)
2019-11-08 12:52:42[刮屑核心引擎]信息:蜘蛛网已打开
2019-11-08 12:52:42[scrapy.extensions.logstats]信息:爬网0页(0页/分钟),爬网0项(0项/分钟)
2019-11-08 12:52:42[scrapy.extensions.telnet]信息:telnet控制台监听127.0.0.1:6023
2019-11-08 12:52:42[scrapy.downloadermiddleware.robotstxt]错误:下载错误:无效主机名:
回溯(最近一次呼叫最后一次):
文件“/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site packages/scrapy/core/downloader/middleware.py”,第44行,处理中请求
defer.returnValue((yield download_func(request=request,spider=spider)))
ValueError:无效的主机名:
2019-11-08 12:52:42[scrapy.core.scraper]错误:下载错误
回溯(最近一次呼叫最后一次):
文件“/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site packages/scrapy/core/downloader/middleware.py”,第44行,处理中请求
defer.returnValue((yield download_func(request=request,spider=spider)))
ValueError:无效的主机名:
2019-11-08 12:52:42[刮屑芯发动机]信息:关闭卡盘(已完成)

不要对URL使用空格

start_urls = [
    'http://quotes.toscrape.com/'
]

你能分享一下你是如何使用上面提到的类的吗?嗨,原来url中的空格是根据下面的评论抛出代码的。我完全确定幕后是如何与全班合作的。