Python 为什么连接被拒绝?
我正在创建一个网页抓取脚本,并将其分为四部分。分开来看,它们都工作得很好,但是当我把它们放在一起时,我得到了以下错误:urlopen error[Errno 111]Connection-densed。我研究过与我类似的问题,并试图用try-except捕捉错误,但即使这样也不起作用。我的全能代码是:Python 为什么连接被拒绝?,python,networking,https,web-scraping,urllib2,Python,Networking,Https,Web Scraping,Urllib2,我正在创建一个网页抓取脚本,并将其分为四部分。分开来看,它们都工作得很好,但是当我把它们放在一起时,我得到了以下错误:urlopen error[Errno 111]Connection-densed。我研究过与我类似的问题,并试图用try-except捕捉错误,但即使这样也不起作用。我的全能代码是: from selenium import webdriver import re import urllib2 site = "" def phone(): global site
from selenium import webdriver
import re
import urllib2
site = ""
def phone():
global site
site = "https://www." + site
if "spokeo" in site:
browser = webdriver.Firefox()
browser.get(site)
content = browser.page_source
browser.quit()
m_obj = re.search(r"(\(\d{3}\)\s\d{3}-\*{4})", content)
if m_obj:
print m_obj.group(0)
elif "addresses" in site:
usock = urllib2.urlopen(site)
data = usock.read()
usock.close()
m_obj = re.search(r"(\(\d{3}\)\s\d{3}-\d{4})", data)
if m_obj:
print m_obj.group(0)
else :
usock = urllib2.urlopen(site)
data = usock.read()
usock.close()
m_obj = re.search(r"(\d{3}-\s\d{3}-\d{4})", data)
if m_obj:
print m_obj.group(0)
def pipl():
global site
url = "https://pipl.com/search/?q=tom+jones&l=Phoenix%2C+AZ%2C+US&sloc=US|AZ|Phoenix&in=6"
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
r_list = [#re.compile("spokeo.com/[^\s]+"),
re.compile("addresses.com/[^\s]+"),
re.compile("10digits.us/[^\s]+")]
for r in r_list:
match = re.findall(r,data)
for site in match:
site = site[:-6]
print site
phone()
pipl()
这是我的回溯:
Traceback (most recent call last):
File "/home/lazarov/.spyder2/.temp.py", line 48, in <module>
pipl()
File "/home/lazarov/.spyder2/.temp.py", line 46, in pipl
phone()
File "/home/lazarov/.spyder2/.temp.py", line 25, in phone
usock = urllib2.urlopen(site)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
它成功了。我认为,这并不是说防火墙正在主动拒绝连接,也不是说其他站点没有启动相应的服务,也不是因为过载。任何帮助都是值得的。通常情况下,细节才是关键 根据你的追踪
File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
return self.do_open(httplib.HTTPSConnection, req)
还有你的源代码
site = "https://www." + site
…我可能认为,在您的代码中,您试图访问https://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7
而在您的测试中,您正在连接到http://www.10digits.us/n/Tom_Jones/Phoenix_AZ/1fe293a0b7
尝试将
https
替换为http
(至少对于www.10digits.us
):可能是您试图抓取的网站(您甚至可以使用浏览器进行检查)侧注:侧注:我尝试时收到:http错误503:服务暂时不可用。然而,当同一个片段单独运行时,它就工作了,这让我怀疑web服务器是否无法处理HTTP请求。是否有办法检查情况是否如此,以及管理它是否正确。是的,如果您收到HTTP错误503(您可以使用try..except
),然后停止5秒钟(导入时间;time.sleep(5)
)。要捕捉错误,您可以查看(在本段末尾有一个完整的示例)
site = "https://www." + site