Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 抓取链接列表时获取SSLError:HTTPSConnectionPool_Python_Beautifulsoup - Fatal编程技术网

Python 抓取链接列表时获取SSLError:HTTPSConnectionPool

Python 抓取链接列表时获取SSLError:HTTPSConnectionPool,python,beautifulsoup,Python,Beautifulsoup,我有一个评论网站的链接列表: links =['https://www.yelp.com/biz/city-tamale-bronx-3', 'https://www.yelp.com/biz/the-boogie-down-grind-caf%C3%A9-bronx', 'https://www.yelp.com/biz/fratillis-pizza-bronx', 'https://www.yelp.com/biz/randall-restaurant-bronx', 'https:/

我有一个评论网站的链接列表:

links =['https://www.yelp.com/biz/city-tamale-bronx-3',   'https://www.yelp.com/biz/the-boogie-down-grind-caf%C3%A9-bronx', 'https://www.yelp.com/biz/fratillis-pizza-bronx', 'https://www.yelp.com/biz/randall-restaurant-bronx', 'https://www.yelp.com/biz/valencia-bakery-bronx-3', 'https://www.yelp.com/biz/the-point-cafe-and-bascom-catering-new-york', 'https://www.yelp.com/biz/delfini-restaurant-bronx', 'https://www.yelp.com/biz/bayside-seafood-company-bronx', 'https://www.yelp.com/biz/il-forno-bakery-bronx', 'https://www.yelp.com/biz/allen-restaurant-bronx']
我编写了一个检索审阅者姓名的函数:

import requests
from bs4 import BeautifulSoup


def getReviewerName (restaurantLink, headers, proxies): 
    session = requests.Session()
    time.sleep(10)
    req = session.get (restaurantLink,headers = headers, proxies =  proxies)     
    bs = BeautifulSoup (req.text, "html.parser")  
    time.sleep(4)
    nameDiv = bs.find_all ("div", {"class":"media-story"})
    time.sleep(3)
    name = [name.find ("li", {"class": "user-name"}) for name in  nameDiv]
    time.sleep(2)
    name = [n.text for n in name if n is not None]
    print (name) 
我在每个请求之前应用time.sleep,这样我的机器人就不会被发现

我编写了一个for循环,将函数getReviewerName应用于链接列表中的每个链接:

for link in links:
    headers = {'User-Agent': get_User_Agent()}
    proxies = {"http": "http://"+get_proxies(), "https":"http://" +  get_proxies()}
    getReviewerName (link, headers, proxies )
['\nDavid L.\n', '\nKarla G.\n', '\nMickey W.\n', '\nGabrielle  P.\n', '\nOmar M.\n', '\nフェルナンド\n', '\nMichael B.\n', '\nBrittany H.\n', '\nTy C.\n', '\ndouble double u.\n', '\nLizzy N.\n', '\nAlina G.\n', '\nSam W.\n', '\nCristina C.\n', '\nLetticia C.\n', '\nJennifer S.\n', '\nJeremy R.\n', '\nKahliah L.\n', '\nE. M.\n', '\nSaïeda H.\n']
在这个for循环中,我使用了一个名为get_User_Agent()的函数来返回一个随机用户代理,我还使用了一个名为get_proxies()的函数来返回一个随机代理。所有这些都是为了不被发现

我只获得链接列表中第一个链接的预期结果:

for link in links:
    headers = {'User-Agent': get_User_Agent()}
    proxies = {"http": "http://"+get_proxies(), "https":"http://" +  get_proxies()}
    getReviewerName (link, headers, proxies )
['\nDavid L.\n', '\nKarla G.\n', '\nMickey W.\n', '\nGabrielle  P.\n', '\nOmar M.\n', '\nフェルナンド\n', '\nMichael B.\n', '\nBrittany H.\n', '\nTy C.\n', '\ndouble double u.\n', '\nLizzy N.\n', '\nAlina G.\n', '\nSam W.\n', '\nCristina C.\n', '\nLetticia C.\n', '\nJennifer S.\n', '\nJeremy R.\n', '\nKahliah L.\n', '\nE. M.\n', '\nSaïeda H.\n']
然而,当我到达第二个链接时,我得到了一个SSLError:

SSLError: HTTPSConnectionPool(host='www.yelp.com', port=443): Max  retries exceeded with url: /biz/the-boogie-down-grind-caf%C3%A9-bronx (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))

任何关于如何解决这个问题的帮助都将不胜感激。谢谢

检查您的代理,您可能正在使用HTTP代理作为HTTPS,即更改您的代理格式:

{"https": "https://..."}