python-如何正确使用asyncio并通过pandas读取csv_Python

python-如何正确使用asyncio并通过pandas读取csv

python

python-如何正确使用asyncio并通过pandas读取csv,python,Python,我在路径中有很多csv文件，我希望使用pandas read_csv来读取，然后使用pandas.concat合并所有返回的数据帧但我认为我没有正确使用asyncio，因为时间消耗并没有缩短 import asyncio import time import pandas as pd import glob2 import os async def read_csv(filename): df = pd.read_csv(filename, header=None) ret

我在路径中有很多csv文件，我希望使用pandas read_csv来读取，然后使用pandas.concat合并所有返回的数据帧

但我认为我没有正确使用asyncio，因为时间消耗并没有缩短

import asyncio
import time
import pandas as pd
import glob2
import os

async def read_csv(filename):
    df = pd.read_csv(filename, header=None)
    return df
t = time.time()
path = r'C:\LRM_STGY_REPO\IB_IN'

tasks = [asyncio.ensure_future(read_csv(i)) for i in list(glob2.iglob(os.path.join(path, "*.txt")))]

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

df = pd.concat([t.result() for t in tasks],ignore_index=True)
# print(df)
print( '%.4f' %(time.time()-t))

t = time.time()
def read_csv2(filename):
    return pd.read_csv(filename, header=None)
df = pd.concat(map(read_csv2,glob2.iglob(os.path.join(path, "*.txt"))),ignore_index=True)
# print(df)
print( '%.4f' %(time.time()-t))

read_csv和read_csv2的消耗时间相似

或者还有其他方法来减少concat时间。

只是出于好奇，如果您执行类似于

results=[t.result（）for t in tasks]

的操作，然后执行

df=pd.concat（results，ignore\u index=True）

？它根本不会缩短，因为

pd

不支持异步读取。@IgnacioVergaraKausel我不知道你的意思是什么？@Sraw还有其他方法吗？@RelaxZeroC我的意思是显式构建一个

结果

列表，然后将其传递给

pd.concat（）

。如果这不起作用，那么也许您可以按照这个答案使用多处理模块。