在python并行处理中,如何找到第一个进程调用函数?

在python并行处理中,如何找到第一个进程调用函数?,python,Python,我有下面的代码片段,它读取CSV文件列表并将它们合并到单个CSV中 import multiprocessing def do(): pool = multiprocessing.Pool(max_threads) list_of_csvs=[] outputdir = 'output/' for csvFile in glob(outputdir + '*.csv'): list_of_csvs.append(csvFile) pool

我有下面的代码片段,它读取CSV文件列表并将它们合并到单个CSV中

import multiprocessing

def do():
    pool = multiprocessing.Pool(max_threads)
    list_of_csvs=[]
    outputdir = 'output/'
    for csvFile in glob(outputdir + '*.csv'):
        list_of_csvs.append(csvFile)
    pool.map(writeToSingleCSV, list_of_csvs)
    pool.close()

def writeToSingleCSV(csvFile):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        for line in inFile:
            singleFile.write(line)

上面的代码正常工作,但我想跳过以下CSV文件的标题。(因为所有CSV文件都包含相同的标题)我如何从第二个文件中跳过标题?

为什么不单独编写标题?像这样的

import multiprocessing

def do():
    pool = multiprocessing.Pool(max_threads)
    list_of_csvs=[]
    outputdir = 'output/'
    for csvFile in glob(outputdir + '*.csv'):
        list_of_csvs.append(csvFile)
    writeToHEADERCSV(list_of_csvs[0])
    pool.map(writeToSingleCSV, list_of_csvs)
    pool.close()

def writeToHEADERCSV(csvFile):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        # Get the first line and write it on the file 

def writeToSingleCSV(csvFile):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        for line in inFile:
            # skip the first line which is header

你为什么不把标题分开写呢?像这样的

import multiprocessing

def do():
    pool = multiprocessing.Pool(max_threads)
    list_of_csvs=[]
    outputdir = 'output/'
    for csvFile in glob(outputdir + '*.csv'):
        list_of_csvs.append(csvFile)
    writeToHEADERCSV(list_of_csvs[0])
    pool.map(writeToSingleCSV, list_of_csvs)
    pool.close()

def writeToHEADERCSV(csvFile):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        # Get the first line and write it on the file 

def writeToSingleCSV(csvFile):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        for line in inFile:
            # skip the first line which is header

另一种方法:在
ignore\u index=True
可以解决标题问题的地方,使用Pandas可能会有所帮助

import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"): #read all xlsx file from a folder
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
print (all_data.describe())
all_data.to_excel('SingleFile.xlsx')

另一种方法:在
ignore\u index=True
可以解决标题问题的地方,使用Pandas可能会有所帮助

import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"): #read all xlsx file from a folder
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
print (all_data.describe())
all_data.to_excel('SingleFile.xlsx')

在执行到writeToSingleCSV的映射之前,我只需附加标题,并在默认情况下使writeToSingleCSV忽略标题

import multiprocessing

def do():
    pool = multiprocessing.Pool(max_threads)
    list_of_csvs=[]
    outputdir = 'output/'
    for csvFile in glob(outputdir + '*.csv'):
        list_of_csvs.append(csvFile)
     #Write a CSV file with the header
     csv_with_header = list_of_csvs.pop()
     writeToSingleCSV(csv_with_header, ignoreHeader=False)
     #Write the following CSV files without the header
     pool.map(writeToSingleCSV, list_of_csvs)
     pool.close()

def writeToSingleCSV(csvFile, ignoreHeader=True):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        if ignoreHeader:
            #Ignore/Remove header from inFile - I would ignore len(header) characters
        for line in inFile:
            singleFile.write(line)

这使得它简单、明确,并且应该易于实现。

在执行到writeToSingleCSV的映射之前,我只需附加标题,并在默认情况下使writeToSingleCSV忽略标题

import multiprocessing

def do():
    pool = multiprocessing.Pool(max_threads)
    list_of_csvs=[]
    outputdir = 'output/'
    for csvFile in glob(outputdir + '*.csv'):
        list_of_csvs.append(csvFile)
     #Write a CSV file with the header
     csv_with_header = list_of_csvs.pop()
     writeToSingleCSV(csv_with_header, ignoreHeader=False)
     #Write the following CSV files without the header
     pool.map(writeToSingleCSV, list_of_csvs)
     pool.close()

def writeToSingleCSV(csvFile, ignoreHeader=True):
    with open('singleDataFile.csv', 'a') as singleFile:
        inFile = open(csvFile, 'r')
        if ignoreHeader:
            #Ignore/Remove header from inFile - I would ignore len(header) characters
        for line in inFile:
            singleFile.write(line)

这使得它简单、明确,并且应该易于实现。

Nice方法:)Nice方法:)