Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用Python下载一组文件的最快方法_Python_Python 3.x_Python Requests_Python Multithreading - Fatal编程技术网

用Python下载一组文件的最快方法

用Python下载一组文件的最快方法,python,python-3.x,python-requests,python-multithreading,Python,Python 3.x,Python Requests,Python Multithreading,这个标题不言自明 到目前为止,我已经尝试使用线程、多处理等 下面是我目前的代码,但我想知道的绝对,最快的方式可能 导入请求 导入线程 def下载(url): r=请求。获取(url) 以open(url,“wb”)作为f: f、 写作(r.content) URL=[ "https://google.com/favicon.ico", ... ] 对于url中的url: threading.Thread(target=download,args=[url]).start() 您当前的代码看起来相

这个标题不言自明

到目前为止,我已经尝试使用线程、多处理等

下面是我目前的代码,但我想知道的绝对,最快的方式可能

导入请求
导入线程
def下载(url):
r=请求。获取(url)
以open(url,“wb”)作为f:
f、 写作(r.content)
URL=[
"https://google.com/favicon.ico",
...
]
对于url中的url:
threading.Thread(target=download,args=[url]).start()

您当前的代码看起来相当不错,只是它会立即为每个URL生成一个线程,如果您有大量URL,这实际上可能会降低您的速度,因为您最终会有太多线程。
看看这个:

我建议使用
multiprocessing.pool.ThreadPool
multiprocessing.pool.pool
来设置活动线程/进程的最大数量。我可能会选择
ThreadPool
,尽管Python中的线程仅限于一个CPU内核,但它们没有创建新进程的开销,而且您的硬盘可能会成为瓶颈

从multiprocessing.pool导入线程池
导入请求

MAX_THREADS=100#您当前的代码看起来相当不错,只是它会立即为每个URL生成一个线程,如果您有大量URL,这实际上可能会降低您的速度,因为您最终会有太多线程。
看看这个:

我建议使用
multiprocessing.pool.ThreadPool
multiprocessing.pool.pool
来设置活动线程/进程的最大数量。我可能会选择
ThreadPool
,尽管Python中的线程仅限于一个CPU内核,但它们没有创建新进程的开销,而且您的硬盘可能会成为瓶颈

从multiprocessing.pool导入线程池
导入请求
MAX_THREADS=100#一个更快的解决方案!(下载速度加快200倍)
您可以使用asyncio。在单独的执行器中启动每个下载,这将提高速度。此外,它将比启动线程快得多,并将避免多处理的池开销:

稍微修改代码的版本(因为URL中有您在文件名中使用的字符!):

取:
5.91 ms±7.1 ms/圈(7次运行的平均值±标准偏差,每个100圈)
与基于asyncio的版本不同,该版本每个循环只需要:
27.7µs±4.19µs(7次运行的平均值±标准偏差,每个循环100000次)

更快的解决方案!(下载速度加快200倍) 您可以使用asyncio。在单独的执行器中启动每个下载,这将提高速度。此外,它将比启动线程快得多,并将避免多处理的池开销:

稍微修改代码的版本(因为URL中有您在文件名中使用的字符!):

取:
5.91 ms±7.1 ms/圈(7次运行的平均值±标准偏差,每个100圈)
与基于asyncio的版本不同,该版本每个循环只需要:
27.7µs±4.19µs(7次运行的平均值±标准偏差,每个循环100000次)

%%timeit
import requests
import threading

def download(url, fn):
  r = requests.get(url)
  with open(str(fn), "wb") as f:
    f.write(r.content)

urls = [
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
]

for i,url in enumerate(urls):
    threading.Thread(target = download, args = [url, i]).start()
%%timeit
import requests
import asyncio

def background(f):
    def wrapped(*args, **kwargs):
        return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
    return wrapped

@background
def download(url, fn):
  r = requests.get(url)
  with open(str(fn), "wb") as f:
    f.write(r.content)

urls = [
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
  'https://www.google.com/url?sa=i&url=https%3A%2F%2Fcityschool.org%2Fcampus%2Ffairmount%2Fhello%2F&psig=AOvVaw2gMH6tzY8psCcMab5FfG2u&ust=1605400822303000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCrkKDmgO0CFQAAAAAdAAAAABAD'
]

for i,url in enumerate(urls):
    download(url,i)