Python 具有返回值的多处理_Python_Python 3.x_Multiprocessing

Python 具有返回值的多处理

python python-3.x

Python 具有返回值的多处理,python,python-3.x,multiprocessing,Python,Python 3.x,Multiprocessing,我有一个问题，将多处理，以加快一些文件的处理，这些文件存储在S3需要检查。由于我现在还不熟悉使用多处理，所以我不确定当我只使用for循环时，代码没有发出就运行到底出了什么问题 def read_json(file): file_key = file["Key"] file_key_split = file_key.split("/") document = get_json_details(file_key) type = file_key

我有一个问题，将多处理，以加快一些文件的处理，这些文件存储在S3需要检查。由于我现在还不熟悉使用多处理，所以我不确定当我只使用for循环时，代码没有发出就运行到底出了什么问题

def read_json(file):
  file_key = file["Key"]
  file_key_split = file_key.split("/")
  document = get_json_details(file_key)
  type = file_key_split[2]  
return document, type

document_list = []
document_type_list = []

mgr = mp.Manager()
nodes = mgr.list()
pool_size = mp.cpu_count()
pool = mp.Pool(processes=pool_size)
# mp.freeze_support()

for file in tqdm(get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)):
    document_list, document_type_list = zip(*pool.map(read_json, file))

pool.close()
pool.join()

我得到的错误如下：

"""
Traceback (most recent call last):
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\GIT\BMWJPSI-BI\03_Lambda_Functions\RegoOCRCheck.py", line 118, in read_json
    file_key = file["Key"]
TypeError: string indices must be integers
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:/GIT/BMWJPSI-BI/03_Lambda_Functions/RegoOCRCheck.py", line 151, in <module>
    document_list, document_type_list = zip(pool.map(read_json, file))
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
TypeError: string indices must be integers```

Thanks for your help.

“”“
回溯（最近一次呼叫最后一次）：
worker中的文件“C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py”，第125行
结果=（True，func（*args，**kwds））
文件“C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py”，第48行，在mapstar中
返回列表（映射（*args））
文件“c:\GIT\BMWJPSI-BI\03\u Lambda\u Functions\RegoOCRCheck.py”，第118行，以read\U json格式
file_key=file[“key”]
TypeError:字符串索引必须是整数
"""
上述异常是以下异常的直接原因：
回溯（最近一次呼叫最后一次）：
文件“c:/GIT/BMWJPSI-BI/03_Lambda_Functions/RegoOCRCheck.py”，第151行，在
document\u list，document\u type\u list=zip（pool.map（read\u json，file））
文件“C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py”，映射中第364行
返回self.\u map\u async（func、iterable、mapstar、chunksize）.get（）
文件“C:\Users\tobia\AppData\Local\Programs\Python\Python38\lib\multiprocessing\pool.py”，第771行，在get中
提升自我价值
TypeError:字符串索引必须是整数```
谢谢你的帮助。

很抱歉延迟了响应，我认为您遇到的问题是将dictionary对象传递到

pool.map

函数中，该函数只会遍历dictionary的键，而不会传递dictionary对象本身。我认为，您应该尝试将整个

get_all_s3_对象（s3，Bucket=docbucket，Prefix=Prefix）

传递到

pool.map

函数中，该函数将被迭代并作为每个元组所在的元组列表返回

（文档列表、文档类型列表）

让我知道，如果您仍然遇到任何问题

似乎传入的参数

文件

是一个字符串类型的值，因此它显示了上面的错误，并且以前它只是在for循环中运行时的一个dictionary对象，您可以发布正在工作的原始代码吗？这样我们可能更容易调试@user696969文件是一个字典，没有更改。我认为问题与其中一个返回值是列表，另一个是字符串有关。运行代码如下：```对于tqdm中的文件（获取所有\u s3\u对象（s3，Bucket=docbucket，Prefix=Prefix））：dl，dtl=read\u json（File）document\u list.append（dl）document\u type\u list.append（dtl）``在我的例子中，一次迭代返回的示例是document=[“blabla”，“hello world”]type=“picture”我必须删除TQM（）部分才能正常工作！感谢您的提示。该部分正常工作

document\u list，document\u type\u list=zip（*pool.map（读取json，获取所有s3对象（s3，Bucket=docbucket，Prefix=Prefix））

你能在你的答案中更正一下吗，我可以将它标记为成功。我无法编辑它。

document_list, document_type_list = zip(*pool.map(read_json, get_all_s3_objects(s3, Bucket=docbucket, Prefix=prefix)))