Python Can'；不要顺利地开始_Python_Scrapy

Python Can'；不要顺利地开始

python scrapy

Python Can'；不要顺利地开始,python,scrapy,Python,Scrapy,我已经开始使用官方教程，但我不能成功地使用它。我的代码与官方教程完全相同 import scrapy class QuotesSpider(scrapy.Spider): name = 'Quotes'; def start_requests(self): urls = [ 'http://quotes.toscrape.com/page/1/', ] for url in urls:

我已经开始使用官方教程，但我不能成功地使用它。我的代码与官方教程完全相同

import scrapy
class QuotesSpider(scrapy.Spider):
    name = 'Quotes';

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/page/1/',
        ]
        for url in urls:
            yield scrapy.Request(url=url,callback = self.parse);

    def parse(self, response):
        page = response.url.split('/')[-2];
        print('--------------------------------->>>>');
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),
            }

当我在CMD上用指令（刮擦爬网引号）执行它时，结果如下：

2020-12-20 10:00:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2020-12-20 10:00:26 [scrapy.core.scraper] ERROR: Spider error processing <GET http://quotes.toscrape.com/page/1/> (referer: None)
Traceback (most recent call last):
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: <200 http://quotes.toscrape.com/page/1/>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\spidermw.py", line 58, in process_spider_input
    return scrape_func(response, request, spider)
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\scraper.py", line 149, in call_spider
    warn_on_generator_with_return_value(spider, callback)
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\misc.py", line 245, in warn_on_generator_with_return_value
    if is_generator_with_return_value(callable):
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\misc.py", line 230, in is_generator_with_return_value
    tree = ast.parse(dedent(inspect.getsource(callable)))
  File "c:\users\a\appdata\local\programs\python\python38-32\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    def parse(self, response):
    ^
IndentationError: unexpected indent
2020-12-20 10:00:26 [scrapy.core.engine] INFO: Closing spider (finished)
2020-12-20 10:00:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

2020-12-20 10:00:25[scrapy.core.engine]调试：爬网（200）（参考：无）
2020-12-20 10:00:26[scrapy.core.scraper]错误：Spider错误处理（参考：无）
回溯（最近一次呼叫最后一次）：
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\site packages\twisted\internet\defer.py”，第1418行，在内联回调中
结果=g.send（结果）
停止迭代：
在处理上述异常期间，发生了另一个异常：
回溯（最近一次呼叫最后一次）：
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\site packages\scrapy\utils\defer.py”，第55行，必须延迟
结果=f（*参数，**kw）
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\site packages\scrapy\core\spidermw.py”，第58行，进程中输入
返回scrape_func（响应、请求、spider）
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\site packages\scrapy\core\scraper.py”，第149行，在call\u spider中
使用返回值警告\u生成器上的\u（爬行器，回调）
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\site packages\scrapy\utils\misc.py”，第245行，在带有返回值的\u生成器\u上的警告\u中
如果是带有返回值的生成器（可调用）：
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\site packages\scrapy\utils\misc.py”，第230行，在is_generator_中，带有返回值
tree=ast.parse（dedent（inspect.getsource（可调用）））
文件“c:\users\a\appdata\local\programs\python\python38-32\lib\ast.py”，第47行，在parse中
返回编译（源、文件名、模式、标志、，
文件“”，第1行
def解析（自我，响应）：
^
缩进错误：意外缩进
2020-12-20 10:00:26[刮屑芯发动机]信息：关闭卡盘（已完成）
2020-12-20 10:00:26[斯拉比统计局]信息：倾销斯拉比统计局：

我检查了很多次，但我仍然不知道如何处理它！

您可能会在这里找到解决问题的方法

有一个

缩进错误

。需要修复代码缩进。它工作正常。

这与产量无关，我认为要么是所有分号，要么是getall（）之后的最后一个逗号

'tags'：quote.css（'div.tags a.tag:：text'）.getall（），

可能会让口译员期待其他事情

删除分号和最后一个逗号-它仍然不起作用吗

错误输出显示以下位置的缩进错误：

def parse
^

这告诉您，是之前的某个原因导致了它，所以我想它应该是第一个分号。

当我在响应中删除了quote的代码时，我可以成功运行它。css（'div.quote'）：yield{'text'：quote.css（'span.text:：text'）。get（），'author'：quote.css（'small.author:：text'）。get（），'tags'：quote.css（'div.tags a.tag:：text'）.getall（），}问题是当我使用yield…时，出现了错误。我知道错误是indicationError。当我消除函数“parse”的“yield”时。我的程序可以成功执行。但是我在缩进上检查了很多次，仍然错误。不需要；行的结尾。删除它。你能显示错误吗？def parse（self，response）：page=response.url.split（“/”）[-2]；print（“------------------------------------->>>>>”）；for-quote-in-response.css（'div.quote'）：yield{'text'：quote.css（'span.text:：text'）.get（），'author'：quote.css（'div.tags a.tag:：text'））.getall（），}如果我删除产量，程序可以顺利执行