Python 在检查网站是否正常工作时,如何绕过或捕获socket.timeout错误?

Python 在检查网站是否正常工作时,如何绕过或捕获socket.timeout错误?,python,python-3.x,Python,Python 3.x,我一直在开发一个程序来检查网站是否正常工作。我从excel工作表中获取URL,然后将结果粘贴到同一excel工作表中的True和false,但对于某些URL,我收到socket.timeout错误,代码在此之后不起作用。代码如下: import http.client as httpc from urllib.parse import urlparse import pandas as pd import xlwings as xw import smtplib from xlsxwrit

我一直在开发一个程序来检查网站是否正常工作。我从excel工作表中获取URL,然后将结果粘贴到同一excel工作表中的True和false,但对于某些URL,我收到socket.timeout错误,代码在此之后不起作用。代码如下:

   import http.client as httpc
from urllib.parse import urlparse
import pandas as pd
import xlwings as xw
import smtplib
from xlsxwriter import Workbook


import socket


x=[]

df = pd.read_excel (r'xyz.xlsx')
df1=pd.DataFrame(df,columns=['URL'])
print(df1)
url_list=df["URL"].tolist()
print(url_list)
for i in url_list:
    def checkUrl(i):
        if 'http' not in i:
            i= 'https://'+i
        p = urlparse(i)
        conn = httpc.HTTPConnection(p.netloc,timeout=4)
        conn.request('HEAD', p.path)
        try:
            resp = conn.getresponse()
            return resp.status<400
        except requests.exceptions.RequestException:
            return False
    print(checkUrl(i))
    x.append(checkUrl(i))


workbook = Workbook('abc.xlsx')
Report_Sheet = workbook.add_worksheet()
Report_Sheet.write(0, 1, 'Value')
Report_Sheet.write_column(1, 1, x)

workbook.close()
将http.client作为httpc导入
从urllib.parse导入urlparse
作为pd进口熊猫
将xlwings作为xw导入
导入smtplib
从xlsxwriter导入工作簿
导入套接字
x=[]
df=pd.read_excel(r'xyz.xlsx')
df1=pd.DataFrame(df,columns=['URL'])
打印(df1)
url_list=df[“url”].tolist()
打印(url\u列表)
对于url_列表中的i:
def checkUrl(一):
如果“http”不在i中:
i='https://'+i
p=urlparse(i)
conn=httpc.HTTPConnection(p.netloc,超时=4)
连接请求(“头”,p.path)
尝试:
resp=conn.getresponse()
返回相应状态第一个猜测是

resp = conn.getresponse()

应该在try子句中。如果不起作用,请添加程序的输出。

此代码中有许多问题

  • 即使url需要HTTPS,您也可以无条件地使用HTTP
  • 您可以在
    try:
  • Exception子句需要一个
    requests.exceptions.RequestException
    不能由代码引发
  • 由于您使用的不是请求库,而是低级的
    http.client
    ,因此您应该只看到套接字库中的错误,这些都是OSError的子类

    您的代码可能会变成(注意:未测试):

    def checkUrl(i):
    如果“http”不在i中:
    i='https://'+i
    p=urlparse(i)
    如果(p.scheme=='http'):
    conn=httpc.HTTPConnection(p.netloc,超时=4)
    其他:
    conn=httpc.HTTPSConnection(p.netloc,超时=4)
    尝试:
    连接请求(“头”,p.path)
    resp=conn.getresponse()
    
    return resp.status根据我的经验,当IP地址解析为有效主机名,但服务器不再配置为使用该主机名时,会发生此错误。这会导致服务器忽略您尝试连接到它的尝试

    要处理此问题,应该在超时错误时返回False

        import socket
    
        try:
            resp = conn.getresponse()
            return resp.status<400
        except requests.exceptions.RequestException:
            return False
        except socket.timeout as err:
            return False
    

    它检查8个网站,第9个网站返回:sock.connect(sa)socket.timeout:timed out非常感谢。它起作用了。有没有办法将这些类型的站点确定为非工作站点?有没有办法将这些类型的站点确定为非工作站点?奈拉布公司
        import socket
    
        try:
            resp = conn.getresponse()
            return resp.status<400
        except requests.exceptions.RequestException:
            return False
        except socket.timeout as err:
            return False
    
        import socket
        import ssl
        import http.client
    
        try:
            resp = conn.getresponse()
            return resp.status < 400
        except http.client.HTTPException as err:
            # A connection was established, but the request failed
            return False 
        except socket.timeout as err:
            # The website no longer exists on the server
            return False
        except socket.gaierror as err:
            # Could not resolve the hostname to an IP address
            return False
        except ssl.CertificateError as err:
            # The SSL certificate was never configured, or it cannot be trusted
            return False
        except ssl.SSLError as err:
            # Other SSL errors not covered by ssl.CertificateError
            return False