Python 如何仅在出现“以下情况时停止进程”；没有互联网“；或；网络错误“；使用请求下载图像时发生_Python_Pandas_Python Requests_Python Imaging Library_Urllib

Python 如何仅在出现“以下情况时停止进程”；没有互联网“；或；网络错误“；使用请求下载图像时发生

python pandas

Python 如何仅在出现“以下情况时停止进程”；没有互联网“；或；网络错误“；使用请求下载图像时发生,python,pandas,python-requests,python-imaging-library,urllib,Python,Pandas,Python Requests,Python Imaging Library,Urllib,我已经编写了一个脚本，从提供的URL下载图像并将其保存在一个目录中。它使用请求访问数据框（CSV文件）中给定的URL，并使用枕头下载我目录中的图像。图像名称是我的CSV中该url的索引号。如果存在任何无法访问的错误url，它只会增加计数器。每次运行脚本时，它都开始从最大索引下载到所需索引。我的代码运行良好。是这样的： import pandas as pd import os from os import listdir from os.path import isfile, join imp

我已经编写了一个脚本，从提供的URL下载图像并将其保存在一个目录中。它使用

请求

访问

数据框

（CSV文件）中给定的URL，并使用

枕头

下载我目录中的图像。图像名称是我的CSV中该url的索引号。如果存在任何无法访问的错误url，它只会增加计数器。每次运行脚本时，它都开始从最大索引下载到所需索引。我的代码运行良好。是这样的：

import pandas as pd

import os
from os import listdir
from os.path import isfile, join
import sys

from PIL import Image

import requests
from io import BytesIO

import argparse


arg_parser = argparse.ArgumentParser(allow_abbrev=True, description='Download images from url in a directory',)

arg_parser.add_argument('-d','--DIR',required=True,
                       help='Directory name where images will be saved')

arg_parser.add_argument('-c','--CSV',required=True,
                       help='CSV file name which contains the URLs')

arg_parser.add_argument('-i','--index',type=int,
                       help='Index number of column which contain the urls')

arg_parser.add_argument('-e','--end',type=int,
                       help='How many images to download')

args = vars(arg_parser.parse_args())


def load_save_image_from_url(url,OUT_DIR,img_name):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    img_format = url.split('.')[-1]
    img_name = img_name+'.'+img_format
    img.save(OUT_DIR+img_name)
    return None


csv = args['CSV']
DIR = args['DIR']

ind = 0
if args.get('index'):
    ind = args['index']

df = pd.read_csv(csv) # read csv
indices = [int(f.split('.')[0]) for f in listdir(DIR) if isfile(join(DIR, f))] # get existing images

total_images_already = len(indices)
print(f'There are already {len(indices)} images present in the directory -{DIR}-\n')
start = 0
if len(indices):
    start = max(indices)+1 # set strating index
    
end = 5000 # next n numbers of images to download
if args.get('end'):
    end = args['end']

print(f'Downloaded a total of {total_images_already} images upto index: {start-1}. Downloading the next {end} images from -{csv}-\n')

count = 0
for i in range(start, start+end):
    if count%250==0:
        print(f"Total {total_images_already+count} images downloaded in directory. {end-count} remaining from the current defined\n")

    url = df.iloc[i,ind]
    try:
        load_save_image_from_url(url,DIR,str(i))
        count+=1

    except (KeyboardInterrupt, SystemExit):
        sys.exit("Forced exit prompted by User: Quitting....")

    except Exception as e:
        print(f"Error at index {i}: {e}\n")
        pass

我想添加一个功能，当出现类似

没有互联网

或

连接错误

的情况时，它会停止进程，而不是继续，比如说5分钟。5次尝试（即25分钟）后，如果问题仍然存在，则应退出程序，而不是增加计数器。我想添加这一点，因为如果没有互联网，比如说2分钟，然后再次出现，它将运行循环并开始从该索引下载图像。下一次如果我运行这个程序，它会认为丢失的URL是错误的，但是没有互联网连接

我该怎么做

注意：显然，我正在考虑使用
time.sleep（）
，但我想知道在

请求中，哪个错误直接反映了没有Internet
或连接错误
？一个是来自请求的。异常导入ConnectionError
如果我必须使用它，我如何使用它在相同的I
计数器上不断尝试，直到5次尝试，然后如果失败，退出程序，在成功连接后，运行常规循环。
我曾经使用过谷歌API，偶尔也不会得到internet、error423或类似的东西

我将整个代码保存在try块和applied.sleep（）的except块中，保存时间为X秒

这样，我就不必搜索错误类型

需要注意的是，在执行此操作之前，请确保您的代码没有任何其他类型的错误，并且能够顺利运行，除非遇到“无Internet”或“网络错误”
这就是我处理这个问题的方法
导入libs
基本操作
尝试：
第一区
第2区
发生错误的块\u
第3区
除：
打印（“网络错误”）
睡眠时间（X秒）

我希望这对你也有帮助。如果这种方法不符合您的目的，请告诉我。
比睡眠更好的方法是使用指数退避
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry_strategy = Retry(
    total=3,
    status_forcelist=[429, 500, 502, 503, 504],
    method_whitelist=["HEAD", "GET", "OPTIONS"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)

response = http.get(url)

在这里，您可以按如下方式配置参数：
total=3-重试尝试的总数
回退因子-它允许您更改进程在失败请求之间的休眠时间
退避系数的公式如下所示：
{backoff factor}*（2**（{总重试次数}-1））

因此，10秒的退避将是必要的
5、10、20、40、80、160、320、640、1280、2560-这些是后续请求之间的睡眠时间
使用time.sleep（）
显然我可以使用time.sleep（）
。我想知道哪个错误反映了没有互联网
或连接错误
，因为请求中可能有很多错误
。所以我想尝试相同的请求，直到20分钟，然后使用sys.exit（没有网络。尝试了25分钟。现在退出
）`，我希望程序能够顺利运行到下一次迭代。我怎么能用这样的东西。我正在使用多个更新新代码，但我使用的块除外。如果代码超过请求数，则会引发异常。你可以用try/except
来包装它，是的，这就是我想知道的。我的意思是在我的代码中应该在哪里添加这个？在load\u save\u image（）
函数内，或者在循环内，我的try-multiple\u除了
？另外，如何将ConnectionError
与您的代码一起使用？上面的代码是一个适配器。您需要使用上述代码将请求.get
更改为http.get
。因为您已经将其包装在try/except
块中。我不认为需要任何更多的改变，尽管我不知道如何工作，但我肯定会这样做的，哈哈！。谢谢你的帮助。