Python 使用boto将多个文件并行上传到s3_Python_Amazon Web Services_Amazon S3_Boto3_Python Multithreading

Python 使用boto将多个文件并行上传到s3

python amazon-web-services amazon-s3

Python 使用boto将多个文件并行上传到s3,python,amazon-web-services,amazon-s3,boto3,python-multithreading,Python,Amazon Web Services,Amazon S3,Boto3,Python Multithreading,我尝试了链接中提到的第二种解决方案，将多个文件上传到s3。此链接中提到的代码没有在线程上调用方法“join”，这意味着即使线程正在运行，主程序也可以终止。使用这种方法，整个程序的执行速度要快得多，但不能保证文件上传是否正确。这是真的吗？我更关心的是主程序完成得快吗？使用这种方法会有什么副作用？只是玩一玩，我看到多处理需要一段时间来拆除一个池，但除此之外没有太多测试代码为： from time import time, sleep from multiprocessing.pool import

我尝试了链接中提到的第二种解决方案，将多个文件上传到s3。此链接中提到的代码没有在线程上调用方法“join”，这意味着即使线程正在运行，主程序也可以终止。使用这种方法，整个程序的执行速度要快得多，但不能保证文件上传是否正确。这是真的吗？我更关心的是主程序完成得快吗？使用这种方法会有什么副作用？

只是玩一玩，我看到

多处理

需要一段时间来拆除一个池，但除此之外没有太多

测试代码为：

from time import time, sleep
from multiprocessing.pool import Pool, ThreadPool
from threading import Thread


N_WORKER_JOBS = 10


def worker(x):
    # print("working on", x)
    sleep(0.1)


def mp_proc(fn, n):
    start = time()
    with Pool(N_WORKER_JOBS) as pool:
        t1 = time() - start
        pool.map(fn, range(n))
        start = time()
    t2 = time() - start
    print(f'Pool creation took {t1*1000:.2f}ms, teardown {t2*1000:.2f}ms')


def mp_threads(fn, n):
    start = time()
    with ThreadPool(N_WORKER_JOBS) as pool:
        t1 = time() - start
        pool.map(fn, range(n))
        start = time()
    t2 = time() - start
    print(f'ThreadPool creation took {t1*1000:.2f}ms, teardown {t2*1000:.2f}ms')


def threads(fn, n):
    threads = []
    for i in range(n):
        t = Thread(target=fn, args=(i,))
        threads.append(t)
        t.start()
    for t in threads:
        t.join()


for test in [mp_proc, mp_threads, threads]:
    times = []
    for _ in range(7):
        start = time()
        test(worker, 10)
        times.append(time() - start)

    times = ', '.join(f'{t*1000:.2f}' for t in times)
    print(f'{test.__name__} took {times}ms')

我得到以下计时（Python 3.7.3、Linux 5.0.8）：

```
mp\u proc
```
~220ms
```
mp\U线程
```
~200ms
```
线程
```
~100ms

然而，拆卸时间均为~100ms，这使所有部件基本上符合要求

我在源代码中查看了日志记录，这似乎是因为每100毫秒只检查一次（它进行状态检查，然后睡眠0.1秒）

有了这些知识，我可以将代码更改为睡眠0.095秒，然后所有内容都在10%以内。此外，考虑到这只在池拆卸时发生一次，很容易在内部循环中安排不发生这种情况

，您可以使用

线程池

来解决大多数问题。但是仍然需要正确的错误处理。该代码的“线程化”版本并没有按照文本所说的那样执行，这可能是它完成得更快的原因。它确实应该等待线程通过

join

ing来完成。此外，

多处理

可以很好地传播异常，而线程化代码将无法做到这一点