Python Scrapy、privoxy和Tor:SocketError:[Errno 61]连接被拒绝
我正在使用Scrapy与Privoxy和Tor。这是我之前的问题,这是蜘蛛:Python Scrapy、privoxy和Tor:SocketError:[Errno 61]连接被拒绝,python,web-scraping,scrapy,tor,Python,Web Scraping,Scrapy,Tor,我正在使用Scrapy与Privoxy和Tor。这是我之前的问题,这是蜘蛛: from scrapy.contrib.spiders import CrawlSpider from scrapy.selector import Selector from scrapy.http import Request class YourCrawler(CrawlSpider): name = "****" start_urls = [ 'https://****.com/lis
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import Selector
from scrapy.http import Request
class YourCrawler(CrawlSpider):
name = "****"
start_urls = [
'https://****.com/listviews/titles.php',
]
allowed_domains = ["****.com"]
def parse(self, response):
# go to the urls in the list
s = Selector(response)
page_list_urls = s.xpath('///*[@id="tab7"]/article/header/h2/a/@href').extract()
for url in page_list_urls:
yield Request(response.urljoin(url), callback=self.parse_following_urls, dont_filter=True)
# Return back and go to bext page in div#paginat ul li.next a::attr(href) and begin again
next_page = response.css('ul.pagin li.presente ~ li a::attr(href)').extract_first()
if next_page is not None:
next_page = response.urljoin(next_page)
yield Request(next_page, callback=self.parse)
# For the urls in the list, go inside, and in div#main, take the div.ficha > div.caracteristicas > ul > li
def parse_following_urls(self, response):
#Parsing rules go here
for each_book in response.css('main#main'):
yield {
'editor': each_book.css('header.datos1 > ul > li > h5 > a::text').extract(),
}
在settings.py中,我有一个用户代理轮换和privoxy:
DOWNLOADER_MIDDLEWARES = {
#user agent
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware' : None,
'****.comm.rotate_useragent.RotateUserAgentMiddleware' :400,
#privoxy
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
'****.middlewares.ProxyMiddleware': 100
}
在middleware.py中,我添加了:
from stem import Signal
from stem.control import Controller
def _set_new_ip():
with Controller.from_port(port=9051) as controller:
controller.authenticate(password='tor_password')
controller.signal(Signal.NEWNYM)
class ProxyMiddleware(object):
def process_request(self, request, spider):
_set_new_ip()
request.meta['proxy'] = 'http://127.0.0.1:8118'
spider.log('Proxy : %s' % request.meta['proxy'])
如果我取出middleware.py中类的def\u set\u new\u ip():
方法(以及class ProxyMiddleware(对象)中对它的调用):
爬行器工作正常。但我希望爬行器每次都调用一个新的IP,这就是我添加它的原因。问题是,每次我尝试运行爬行器时,它都返回一个错误SocketError:[Errno 61]连接被拒绝
,带有此回溯:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py", line 37, in process_request
response = yield method(request=request, spider=spider)
File "/Users/nikita/scrapy/***/***/middlewares.py", line 71, in process_request
_set_new_ip()
File "/Users/nikita/scrapy/***/***/middlewares.py", line 65, in _set_new_ip
with Controller.from_port(port=9051) as controller:
File "/usr/local/lib/python2.7/site-packages/stem/control.py", line 998, in from_port
control_port = stem.socket.ControlPort(address, port)
File "/usr/local/lib/python2.7/site-packages/stem/socket.py", line 372, in __init__
self.connect()
File "/usr/local/lib/python2.7/site-packages/stem/socket.py", line 243, in connect
self._socket = self._make_socket()
File "/usr/local/lib/python2.7/site-packages/stem/socket.py", line 401, in _make_socket
raise stem.SocketError(exc)
SocketError: [Errno 61] Connection refused
2017-07-11 15:50:28 [scrapy.core.engine] INFO: Closing spider (finished)
可能问题出在控制器的中使用的端口。从作为控制器的\u端口(端口=9051):
,但我不确定。如果有人有这样的想法,那就太棒了
编辑---
好的,如果我转到浏览器并转到,它会说:
503
This is Privoxy 3.0.26 on localhost (127.0.0.1), port 8118, enabled
Forwarding failure
Privoxy was unable to socks5-forward your request http://127.0.0.1:8118/ through localhost: SOCKS5 request failed
Just try again to see if this is a temporary problem, or check your forwarding settings and make sure that all forwarding servers are working correctly and listening where they are supposed to be listening.
所以,也许这与SOCKS5的配置有关……有人知道吗?我的猜测是:
ps
(例如,ps-ax | grep-Tor
)和netstat
(例如,对于mac:netstat-an | grep“您的Tor端口号”
。对于linux,将终端上的-an
替换为-tulnp
),以查看Tor是否真正在运行forward-socks5t/127.0.0.1:9050。
未注释看看如何使用
stem
连接到Tor。好的,在这个网站上,他们讨论了authenticate()
函数。在给出的示例中,他们首先制作control\u socket=stem.socket.ControlPort(port=9051)
,然后制作stem.connection.authenticate(control\u socket)
。我应该把它们都放在ProxyMiddleware
类中吗?好的,我知道我必须在某个地方调用connect()
函数,但是,在哪里?我尝试了一些选项,但没有一个成功…我有一些东西,请更新问题。你确定你已经运行了Tor,并且使用Tor的Privoxy设置正确且有效吗?