使用csv+的多处理;熊猫+;python

使用csv+的多处理;熊猫+;python,python,pandas,multiprocessing,Python,Pandas,Multiprocessing,我已经编写了一个代码,它将迭代文件夹中的每个csv,使用数据帧读取它,并将其附加到主df(稍后将由用户使用)中 import glob import os import pandas as pd import time import multiprocessing as mp from multiprocessing.dummy import Pool constituent_df= pd.DataFrame() def process(file): ''' This Fu

我已经编写了一个代码,它将迭代文件夹中的每个csv,使用数据帧读取它,并将其附加到主df(稍后将由用户使用)中

import glob
import os
import pandas as pd
import time
import multiprocessing as mp
from multiprocessing.dummy import Pool


constituent_df= pd.DataFrame()

def process(file):
    '''
    This Function reads csv and appends it to a global data-frame
    Parameters:
        file-> csv file
    '''
    fields= ('REGION', 'CURR')
    print("Pandas Reading:", file)
    csv_df= pd.read_csv(file, skipinitialspace=True, usecols=fields)

    constituent_df= constituent_df.append(csv_df, ignore_index=True)

def main():
    '''
    This module reads files present in the directory
    And 
    '''
    pool = mp.Pool(processes=4)

    start= time.time()
    constituent_df= pd.DataFrame()
    for file in glob.glob(os.path.join(os.getcwd(),'2653AM\\*.csv')):
        pool.apply_async(process,[file])

    pool.close()
    pool.join()   
    end= time.time()
    print("It took:", end-start)

    print(constituent_df)
    constituent_df.to_excel(excel_writer="Constituent_Data_MP.xlsx", index=False)

if __name__=='__main__':
    main()
    #print(constituent_df)

我救不了你。有谁能指导我如何储存这些成分?还有其他方法吗?

我修改了我想要处理池中文件的方法,并得到了一个有效的解决方案:

def main():
    '''
    This module reads files present in the directory
    And 
    '''
    file_list=[]
    constituent_df= pd.DataFrame()

    start= time.time()

    for file in glob.glob(os.path.join(os.getcwd(),'2653AM\\*.csv')):
        file_list.append(file)

    with Pool(processes=4) as pool:
        df_list = pool.map(process_csv, file_list)
        constituent_df = pd.concat(df_list, ignore_index=True)
    end= time.time()
    print("It took:", end-start)


而不是做多重处理等。。。你自己,你可能想考虑利用它的类似问题