使用csv+的多处理;熊猫+;python
我已经编写了一个代码,它将迭代文件夹中的每个csv,使用数据帧读取它,并将其附加到主df(稍后将由用户使用)中使用csv+的多处理;熊猫+;python,python,pandas,multiprocessing,Python,Pandas,Multiprocessing,我已经编写了一个代码,它将迭代文件夹中的每个csv,使用数据帧读取它,并将其附加到主df(稍后将由用户使用)中 import glob import os import pandas as pd import time import multiprocessing as mp from multiprocessing.dummy import Pool constituent_df= pd.DataFrame() def process(file): ''' This Fu
import glob
import os
import pandas as pd
import time
import multiprocessing as mp
from multiprocessing.dummy import Pool
constituent_df= pd.DataFrame()
def process(file):
'''
This Function reads csv and appends it to a global data-frame
Parameters:
file-> csv file
'''
fields= ('REGION', 'CURR')
print("Pandas Reading:", file)
csv_df= pd.read_csv(file, skipinitialspace=True, usecols=fields)
constituent_df= constituent_df.append(csv_df, ignore_index=True)
def main():
'''
This module reads files present in the directory
And
'''
pool = mp.Pool(processes=4)
start= time.time()
constituent_df= pd.DataFrame()
for file in glob.glob(os.path.join(os.getcwd(),'2653AM\\*.csv')):
pool.apply_async(process,[file])
pool.close()
pool.join()
end= time.time()
print("It took:", end-start)
print(constituent_df)
constituent_df.to_excel(excel_writer="Constituent_Data_MP.xlsx", index=False)
if __name__=='__main__':
main()
#print(constituent_df)
我救不了你。有谁能指导我如何储存这些成分?还有其他方法吗?我修改了我想要处理池中文件的方法,并得到了一个有效的解决方案:
def main():
'''
This module reads files present in the directory
And
'''
file_list=[]
constituent_df= pd.DataFrame()
start= time.time()
for file in glob.glob(os.path.join(os.getcwd(),'2653AM\\*.csv')):
file_list.append(file)
with Pool(processes=4) as pool:
df_list = pool.map(process_csv, file_list)
constituent_df = pd.concat(df_list, ignore_index=True)
end= time.time()
print("It took:", end-start)
而不是做多重处理等。。。你自己,你可能想考虑利用它的类似问题