继续执行基于Python脚本的关闭错误_Python_Xml_Ubuntu_Web Scraping

继续执行基于Python脚本的关闭错误

python xml ubuntu web-scraping

继续执行基于Python脚本的关闭错误,python,xml,ubuntu,web-scraping,Python,Xml,Ubuntu,Web Scraping,我在Ubuntu14.04上使用Python2.7抓取和旋转代理。。。清除错误几分钟后： raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)) if keyword1 in text and keyword2 in text and keyword3 in text:

我在Ubuntu14.04上使用Python2.7抓取和旋转代理。。。清除错误几分钟后：

raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))


            if keyword1 in text and keyword2 in text and keyword3 in text:
                print("LINK SCRAPED")
                print(text, "link scraped")
                found = True 
                break 

except requests.exceptions.ConnectionError as err:
    print("Encountered ConnectionError, retrying: {}".format(err))

如果这不是实现

try

的正确方法，我假设只有

请求进入try子句，其他所有内容都在之后，除了？
而不是重新启动脚本，您可以使用try/except语句来处理错误
例如：
try:
    # line of code that is failing
except requests.exceptions.ConnectionError as err:
    print("Encountered ConnectionError, retrying: {}".format(err))

然后重试最初的呼叫
更新：根据您更新的代码示例，下面是我要做的：
from bs4 import BeautifulSoup
import requests
import smtplib
import urllib2
from random import randint
import time
from lxml import etree
from time import sleep
import random


proxies = {'https': '100.00.00.000:00000'}
hdr1 = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Accept-Encoding': 'none',
    'Accept-Language': 'en-US,en;q=0.8',
    'Connection': 'keep-alive',
}

hdrs = [hdr1] #, hdr2, hdr3, hdr4, hdr5, hdr6, hdr7]
ua = random.choice(hdrs)
head = {
    'Connection': 'close',
    'User-Agent': ua,
}

#####   REQUEST  1  ####
done = False
while not done:
    try:
        a = requests.get('https://store.fabspy.com/sitemap.xml', proxies=proxies, headers=head)
        done = True
    except requests.exceptions.ConnectionError as err:
        print('Encountered ConnectionError, retrying: {}'.format(err))
        time.sleep(1)

scrape = BeautifulSoup(a.text, 'lxml')
links = scrape.find_all('loc')
for link in links:
    if 'products' in link.text:
        sitemap = str(link.text)
        break

keyword1 = 'not'
keyword2 = 'on'
keyword3 = 'site'

#########    REQUEST 2 #########
done = False
while not done:
    try:
        r = requests.get(sitemap, proxies=proxies, headers=head)
        done = True
    except requests.exceptions.ConnectionError as err:
        print('Encountered ConnectionError, retrying: {}'.format(err))
        sleep(randint(4,6))

soup = BeautifulSoup(r.text, 'lxml')
links = soup.find_all('loc')
for link in links:
    text = link.text
    if keyword1 in text and keyword2 in text and keyword3 in text:
        print(text, 'link scraped')
        break

我将删除beautifulsoup标记。我已尝试将其应用于我正在运行的脚本的更精简版本，我已在上面对其进行了编辑，您能验证吗？try语句是否应包含整个请求
循环？或者只有初始请求和之后的循环的其余部分，除了
@ColeWorld之外，我刚刚更新了我的答案，为ya添加了一个重新编写的代码示例。谢谢，到目前为止，这似乎解决了错误问题，但我认为它与关键字搜索冲突。。如果您将任何字符串传递给关键字，其中1个关键字与网站上的某个链接相匹配，那么它将提供该链接。对不起，不确定您所说的关键字搜索是什么意思。我只是想正确地处理错误，我对程序中的其他逻辑不是很熟悉。