Python 如何在列表中生成多处理作业而不进行重复？_Python_Multiprocessing_Jobs

Python 如何在列表中生成多处理作业而不进行重复？

python

Python 如何在列表中生成多处理作业而不进行重复？,python,multiprocessing,jobs,Python,Multiprocessing,Jobs,如何使多处理器系统工作，从而在列表中生成新作业？我不断得到：断言self.\u popen为无，'不能启动进程两次' AttributeError:“Worker”对象没有属性“\u popen” 这是有道理的，因为我基本上是做同一份工作的多个实例。。。那么我该如何解决这个问题呢？我需要设置多处理器池吗如果我需要进一步澄清，请告诉我这是我的多处理类： class Worker(multiprocessing.Process): def __init__(self, output

如何使多处理器系统工作，从而在列表中生成新作业？我不断得到：

断言self.\u popen为无，'不能启动进程两次' AttributeError:“Worker”对象没有属性“\u popen”

这是有道理的，因为我基本上是做同一份工作的多个实例。。。那么我该如何解决这个问题呢？我需要设置多处理器池吗

如果我需要进一步澄清，请告诉我

这是我的多处理类：

class Worker(multiprocessing.Process):

    def __init__(self, output_path, source, file_name):
        self.output_path = output_path
        self.source = source
        self.file_name = file_name

    def run(self):

        t = HTML(self.source)
        output = open(self.output_path+self.file_name+'.html','w')
        word_out = open(self.output_path+self.file_name+'.txt','w')  
        try:
            output.write(t.tokenized)

            for w in word_list:
                if w:
                    word_out.write(w+'\n')

            word_out.close()
            output.close()
            word_list = []

        except IndexError: 
            output.write(s[1])
            output.close()
            word_out.close()

        except UnboundLocalError:
            output.write(s[1])
            output.close()
            word_out.close()

下面是实现整个过程的类

class implement(HTML):

    def __init__(self, input_path, output_path):
        self.input_path = input_path
        self.output_path = output_path

    def ensure_dir(self, directory):
        if not os.path.exists(directory):
            os.makedirs(directory)
        return directory    

    def prosses_epubs(self):
        for root, dirs, files in os.walk(self.input_path+"\\"):
            epubs = [root+file for file in files if file.endswith('.epub')]
            output_file = [self.ensure_dir(self.output_path+"\\"+os.path.splitext(os.path.basename(e))[0]+'_output\\') for e in epubs]

        count = 0 
        for e in epubs:
            epub = epubLoader(e)

            jobs = []

            # this is what's breaking everything right here. I'm not sure how to fix it. 
            for output_epub in epub.get_html_from_epub():                
                worker = Worker(output_file[count], output_epub[1], output_epub[0])
                jobs.append(worker)
                worker.start()

            for j in jobs:
                j.join()

            count += 1
        print "done!"


if __name__ == '__main__':
    test = implement('some local directory', 'some local directory')    
    test.prosses_epubs()

在此方面的任何帮助都将不胜感激。同时让我知道我在代码中所做的事情是否可以做得更好。。。我一直在努力学习如何以最好的方式做事

。在本例中，您的类每个基本上都有一个具体的方法，
```
\uuu init\uu
```
方法是简单地保存meati方法中使用的参数。你可以做你的只需将meaty方法设置为函数并传递直接指向它的参数
将“工作”（即任务）的概念与“工人”的概念分开（即过程）。您的机器的处理器数量有限，但就业机会可能会多得多。你不想开一家银行吗每个作业都有新的流程，因为这可能会淹没您的CPU-- 基本上是你自己
使用以确保您的文件句柄关闭我看到
```
output.close（）
```
和
```
word\u out.close（）
```
被调用分别在三个不同的地方。您可以通过使用
```
with
```
-语句，它将自动关闭这些文件一旦Python使用-suite离开
```
，就会处理
```

我认为a将与您的代码配合使用。作业可以发送给池中的工人使用池。应用\u async。每个呼叫都将等待的作业排队直到池中的工作人员可以处理它<代码>池.连接（） 使主进程等待所有作业完成
使用
```
os.path.join
```
而不是使用
```
'\\'
```
连接目录。这将使您的代码与非Windows计算机兼容
使用而不是手动实现/递增柜台。它减少了输入，并且使代码更具可读性

由于未定义

epubLoader

、

HTML

和

word\u list

，以下代码将不会运行，但它可能会让您更清楚地了解我上面的建议：

import multiprocessing as mp

def worker(output_path, source, filename):
    t = HTML(source)
    output_path = output_path+filename
    output = open(output_path+'.html', 'w')
    word_out = open(output_path+'.txt','w')
    with output, word_out:
        try:
            output.write(t.tokenized)

            for w in word_list:
                if w:
                    word_out.write(w+'\n')

            word_list = []

        except IndexError: 
            output.write(s[1])

        except UnboundLocalError:
            output.write(s[1])


def ensure_dir(directory):
    if not os.path.exists(directory):
        os.makedirs(directory)
    return directory    


def process_epubs(input_path, output_path):
    pool = mp.Pool()

    for root, dirs, files in os.walk(input_path):
        epubs = [os.path.join(root, file) for file in files
                 if file.endswith('.epub')]
        output_file = [
            ensure_dir(
                os.path.join(
                    output_path,
                    os.path.splitext(os.path.basename(e))[0] + '_output')
                for e in epubs)]

    for count, e in enumerate(epubs):
        epub = epubLoader(e)
        for filename, source in epub.get_html_from_epub():
            pool.apply_async(
                worker,
                args=(output_file[count], source, filename))
    pool.close()
    pool.join()

    print "done!"


if __name__ == '__main__':
    process_epubs('some local directory', 'some local directory')

非常感谢您详细的回答！这是非常有帮助的。