在scrapyd中启用HttpProxyMiddleware

在scrapyd中启用HttpProxyMiddleware,scrapy,scrapyd,Scrapy,Scrapyd,在阅读了scrapy文档之后,我认为默认情况下启用了HttpProxyMiddleware。但当我通过scrapyd的webservice接口启动spider时,HttpProxyMiddleware并没有启用。我收到以下输出: 2013-02-18 23:51:01+1300 [scrapy] INFO: Scrapy 0.17.0-120-gf293d08 started (bot: pde) 2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled

在阅读了scrapy文档之后,我认为默认情况下启用了HttpProxyMiddleware。但当我通过scrapyd的webservice接口启动spider时,HttpProxyMiddleware并没有启用。我收到以下输出:

2013-02-18 23:51:01+1300 [scrapy] INFO: Scrapy 0.17.0-120-gf293d08 started (bot: pde)
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled extensions: FeedExporter, LogStats, CloseSpider, WebService, CoreStats, SpiderState
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled item pipelines: PdePipeline
2013-02-18 23:51:02+1300 [shotgunsupplements] INFO: Spider opened
请注意,HttpProxyMiddleware未启用。我如何为scrapyd启用它?任何帮助都将不胜感激

我的scrapy.cfg

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/topics/scrapyd.html

[settings]
default = pd.settings

[deploy]
url = http://localhost:6800/
project = pd
我有以下设置.py

BOT_NAME = 'pd' #this gets replaced with a function
BOT_VERSION = '1.0'

SPIDER_MODULES = ['pd.spiders']
NEWSPIDER_MODULE = 'pd.spiders'
DEFAULT_ITEM_CLASS = 'pd.items.Product'
ITEM_PIPELINES = 'pd.pipelines.PdPipeline'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)

TELNETCONSOLE_HOST = '127.0.0.1' # defaults to 0.0.0.0 set so
TELNETCONSOLE_PORT = '6073'      # only we can see it.
TELNETCONSOLE_ENABLED = False

WEBSERVICE_ENABLED = True

LOG_ENABLED = True


ROBOTSTXT_OBEY = False
ITEM_PIPELINES = [
    'pd.pipelines.PdPipeline',
    ]

DATA_DIR = '/home/pd/scraped_data' #directory to store export files to.

DOWNLOAD_DELAY = 2.0

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 750,
}
问候,


Pranshu

在花了很多时间尝试调试之后,HttpProxyMiddleware实际上希望设置http_代理环境变量。如果未设置http_代理,则不会加载中间件。因此,我设置了http_代理和bob的叔叔!一切正常

只是想检查一下,你的settings.py文件是什么样子的?@Talvalin请看一下这个问题。我对问题进行了编辑,以包括设置。谢谢,看起来不错。如果您注释掉DOWNLOADER\u Middleware部分会发生什么情况?不幸的是,相同的结果:(DOWNLOADER\u Middleware中的750是多少?听起来很奇怪,但您应该接受自己的答案,因为您提出了一个有效的问题并回答了它。:)如果设置了http\u proxy环境变量,无论是否使用中间件,http请求都不会被代理吗?如何使用scrapyd设置它?从脚本中设置os.environ,还是有其他方法?