Python 使用urllib2重写HTTP错误_Python_Http_Beautifulsoup_Urllib2

Python 使用urllib2重写HTTP错误

python http

Python 使用urllib2重写HTTP错误,python,http,beautifulsoup,urllib2,Python,Http,Beautifulsoup,Urllib2,我有这个代码，但它不工作。我想使用urllib2遍历URL列表。打开每个url后，BeautifulSoup将定位一个类并提取该文本。如果列表中存在无效url，程序将暂停。如果有任何错误，我只想有'错误'作为文本，并为程序继续到下一个网址。有什么想法吗 for url in url_list: page=urllib2.urlopen(url) soup = BeautifulSoup(page.read()) text = sou

我有这个代码，但它不工作。我想使用urllib2遍历URL列表。打开每个url后，BeautifulSoup将定位一个类并提取该文本。如果列表中存在无效url，程序将暂停。如果有任何错误，我只想有'错误'作为文本，并为程序继续到下一个网址。有什么想法吗

    for url in url_list:
         page=urllib2.urlopen(url)
         soup = BeautifulSoup(page.read())

         text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
         if text is not None:
            for t in text:
                text2 = t.get_text().encode('utf-8')
         else:
            text2 = 'error'

try/except

是你的朋友！将代码更改为s/之类的内容：

for url in url_list:
    try:
        page = urllib2.urlopen(url)
    except urllib2.URLError:
        text2 = 'error'
    else:
        soup = BeautifulSoup(page.read())
        text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
        if text:
           for t in text:
               text2 = t.get_text().encode('utf-8')
        else:
           text2 = 'error'

try/except

是你的朋友！将代码更改为s/之类的内容：

for url in url_list:
    try:
        page = urllib2.urlopen(url)
    except urllib2.URLError:
        text2 = 'error'
    else:
        soup = BeautifulSoup(page.read())
        text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
        if text:
           for t in text:
               text2 = t.get_text().encode('utf-8')
        else:
           text2 = 'error'

urllib2.urlopen在出现错误时引发URLError，如中所示

使用try-except块：

try:
    page = urllib2.urlopen(url)
except urllib2.URLError as e:
    print e

urllib2.urlopen在出现错误时引发URLError，如中所示

使用try-except块：

try:
    page = urllib2.urlopen(url)
except urllib2.URLError as e:
    print e

[]不是无

将始终是True@PadraicCunningham是的，既然这就是

find_all

在没有点击时返回的内容，我将编辑我的A以通过OP解决这个进一步的问题（我最初只是Q:-）的c&p。@AlexMartelli非常感谢！这就解决了问题。我是否正确理解它首先检查

URLError

，最后检查任何其他错误？@textnet，否：它只检查URLError，只有在没有错误的情况下才执行

else:

部分——任何其他异常在这里都是意外的，因此，作为最佳实践，允许它向上传播调用堆栈。

[]不是无

将始终是True@PadraicCunningham是的，既然这就是

find_all

在没有点击时返回的内容，我将编辑我的A以通过OP解决这个进一步的问题（我最初只是Q:-）的c&p。@AlexMartelli非常感谢！这就解决了问题。我是否理解正确，它首先检查

URLError

，最后检查任何其他错误？@textnet，否：它只检查URLError，只有在没有错误的情况下才执行

else:

部分——任何其他异常在这里都是意外的，因此，作为最佳实践，允许它向上传播调用堆栈。