Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ssis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何正确使用scrapy Spider中间件?_Python_Scrapy - Fatal编程技术网

Python 如何正确使用scrapy Spider中间件?

Python 如何正确使用scrapy Spider中间件?,python,scrapy,Python,Scrapy,我有一个正在工作的scrapy项目,现在我想在其中添加一些自定义中间件 我通过取消下面三行的注释,在settings.py中启用了Spider中间件 # Enable or disable spider middlewares # See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html SPIDER_MIDDLEWARES = { 'sweden.middlewares.SwedenSpiderMi

我有一个正在工作的
scrapy
项目,现在我想在其中添加一些自定义中间件

我通过取消下面三行的注释,在
settings.py
中启用了Spider中间件

# Enable or disable spider middlewares
# See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html
SPIDER_MIDDLEWARES = {
   'sweden.middlewares.SwedenSpiderMiddleware': 543,
}
尽管如此,我添加到middleware.py的任何代码似乎都会被忽略。例如,我添加到下面最后一个方法中的
input()
命令不会执行,即使我成功地刮取了一些页面

# -*- coding: utf-8 -*-

# Define here the models for your spider middleware
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals


class SwedenSpiderMiddleware(object):
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        input("press any key to continue")

        # Should return None or raise an exception.
        return None

   ...
我没有修改默认的文件夹结构。我无法使这项工作和例子似乎缺乏

它也不会显示在启动日志中:

2017-08-21 16:59:41 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-08-21 16:59:41 [scrapy.utils.log] INFO: Overridden settings: {'FEED_URI': 'result.jl', 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-08-21 16:59:41 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-08-21 16:59:41 [scrapy.core.engine] INFO: Spider opened
2017-08-21 16:59:41 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
这是文件结构:

.
├── venv
├── tutorial
└── sweden
    ├── __pycache__
    ├── scrapy.cfg
    └── sweden
        ├── __init__.py
        ├── __pycache__
        ├── items.py
        ├── middlewares.py
        ├── pipelines.py
        ├── settings.py
        └── spiders
             ├── __init__.py
             ├── __pycache__
             └──  sweden_spider.py

你能分享零碎的启动日志吗?您的中间件是否显示在
[scrapy.middleware]信息:启用的蜘蛛中间件:
?如果你评论crawler.signals.connect(),它会改变什么吗?没错,它不会显示在启动日志中启用的中间件下。注释
crawler.signals.connect()
似乎没有什么不同。您可以共享项目的文件结构吗?(例如,
树的输出
)我添加了文件结构。工作正常!非常感谢你的帮助!!我根本不是从这个角度看的。设置只是我定义的一个
dict
,而不是从
settings.py
中提取它。