Asynchronous 并行优化分组解析_Asynchronous_Parallel Processing_Beautifulsoup

Asynchronous 并行优化分组解析

asynchronous parallel-processing

Asynchronous 并行优化分组解析,asynchronous,parallel-processing,beautifulsoup,Asynchronous,Parallel Processing,Beautifulsoup,我有一个要用BeautifulSoup解析的文件列表。运行 soup = BeautifulSoup(file, 'html.parser') 每个文件大约需要2秒钟，以便 soups = [] for f in files: soups.append(BeautifulSoup(f, 'html.parser')) 大约需要40秒。我想对每个文件一起运行BeautifulSoup（文件'html.parser'），这样整个过程大约在2秒钟内完成。这可能吗我尝试了以下不起作用的方

我有一个要用BeautifulSoup解析的文件列表。运行

soup = BeautifulSoup(file, 'html.parser')

每个文件大约需要2秒钟，以便

soups = []
for f in files:
    soups.append(BeautifulSoup(f, 'html.parser'))

大约需要40秒。我想对每个文件一起运行

BeautifulSoup（文件'html.parser'）

，这样整个过程大约在2秒钟内完成。这可能吗

我尝试了以下不起作用的方法：

async def parse_coroutine(F):
    return BeautifulSoup(F, 'html.parser')

async def parse(F):
    p = await parse_coroutine(F)
    return p

lst = [parse(f) for f in files]

async def main():
    await asyncio.gather(*lst)

asyncio.run(main())

1）

BeautifulSoup（F，'html.parser'）

运行到完成，运行时我无法调用其他函数

2）上面的代码并没有给出我想要的：我希望

BeautifulSoup（F，'html.parser'）

返回的对象存储在一个列表中

根据，async并没有按照我希望的方式真正实现并行处理。那么我有什么选择呢？如果可能的话，我想要一个具体的解决方案，因为我不熟悉多线程/并发编程等。

您是否尝试过使用gevent或普通线程