Python 刮痧蜘蛛和刮痧壳的产量不同_Python_Scrapy_Web Crawler

Python 刮痧蜘蛛和刮痧壳的产量不同

python scrapy web-crawler

Python 刮痧蜘蛛和刮痧壳的产量不同,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,我是scrapy的新手，我正在试图弄明白为什么我能够从scrapy外壳中提取所需的元素，而不是从我从命令行创建的scrapy spider中提取所需的元素在scrapy shell中，我执行了以下操作： pipenv run scrapy shell http://quotes.toscrape.com/ 然后返回以下内容： pipenv run scrapy shell http://quotes.toscrape.com/ ['Albert Einstein'，'J.K.Rowlin

我是scrapy的新手，我正在试图弄明白为什么我能够从scrapy外壳中提取所需的元素，而不是从我从命令行创建的scrapy spider中提取所需的元素

在scrapy shell中，我执行了以下操作：

pipenv run scrapy shell http://quotes.toscrape.com/

然后

返回以下内容：

pipenv run scrapy shell http://quotes.toscrape.com/

['Albert Einstein'，'J.K.Rowling'，'Albert Einstein'，'Jane Austen'，'Marilyn Monroe'，'Albert Einstein'，'AndréGide'，'Thomas A.Edison'，'Eleanor Roosevelt'，'Steve Martin']

这一切都是有意的。但是，当我创建一个“刮擦蜘蛛”并在之后运行它时，我开始遇到一些问题。我的代码如下：

# -*- coding: utf-8 -*-
import scrapy

class Yolo1Spider(scrapy.Spider):
    name = 'yolo1'
    allowed_domains = ['toscrape.com']
    start_urls = ['http://http://quotes.toscrape.com/']

    def parse(self, response):
        self.log('Just visited' + response.url)
        yield {
            'author': response.css('small.author::text').extract()
            }

我使用以下命令行运行spider：

pipenv run scrapy crawl yolo1

我得到的错误如下：

2017-12-04 20:03:56[yolo1]调试：只需visitedhttp://www.dnsrsearch.com/index.php?origURL= 2017-12-04 20:03:56[scrapy.core.scraper]错误：处理错误{'author'：[] 回溯（最近一次呼叫最后一次）：文件“c:\users\alice.virtualenvs\all-the-places-c44chfla\lib\site packages\twisted\internet\defer.py”，第653行，在运行回调中 current.result=回调（current.result，*args，**kw）文件“C:\Users\alice\all-places\locations\pipelines.py”，第16行，进程中\u项 ref=项目['ref'] KeyError:'ref'

我有种感觉，我只是错过了一些简单的东西，但对于我的生活来说，我无法找到它，我一直在到处检查

您可以在爬行器爬网的输出中看到，我编写的调试行已经打印出来，但是在这之后，我得到了一个错误。我真的认为我应该从spider和我所做的命令行工作中获得相同的输出。

您在开始url中犯了错误-您有

http://

两次

参见

http://http://quotes.toscrape.com/

您在url中有两次

http://

-请参阅

'http://http://quotes.toscrape.com/

非常感谢。有时候你真的需要对这些事情有一个全新的视角。