Python/多处理：进程似乎没有启动_Python_Linux_Parallel Processing_Multiprocessing

Python/多处理：进程似乎没有启动

python linux parallel-processing

Python/多处理：进程似乎没有启动,python,linux,parallel-processing,multiprocessing,Python,Linux,Parallel Processing,Multiprocessing,我有一个函数，它读取二进制文件，并将每个字节转换为相应的字符序列。例如，0x05变为“AACC”，0x2A变为“AGGG”等。读取文件并转换字节的函数目前是线性函数，由于要转换的文件介于25kb和2Mb之间，这可能需要相当长的时间因此，我尝试使用多处理来划分任务，并希望提高速度。然而，我就是不能让它工作。下面是线性函数，虽然运行缓慢 def fileToRNAString(_file): if (_file and os.path.isfile(_file)): rn

我有一个函数，它读取二进制文件，并将每个字节转换为相应的字符序列。例如，0x05变为“AACC”，0x2A变为“AGGG”等。读取文件并转换字节的函数目前是线性函数，由于要转换的文件介于25kb和2Mb之间，这可能需要相当长的时间

因此，我尝试使用多处理来划分任务，并希望提高速度。然而，我就是不能让它工作。下面是线性函数，虽然运行缓慢

def fileToRNAString(_file):

    if (_file and os.path.isfile(_file)):
        rnaSequences = []
        blockCount = 0
        blockSize = 2048
        printAndLog("!", "Converting %s into RNA string (%d bytes/block)" % (_file, blockSize))
        with open(_file, "rb") as hFile:
            buf = hFile.read(blockSize)
            while buf:
                decSequenceToRNA(blockCount, buf, rnaSequences)
                blockCount = blockCount + 1
                buf = hFile.read(blockSize)
    else:
        printAndLog("-", "Could not find the specified file. Please verify that the file exists:" + _file)
    return rnaSequences

注意：函数'decSequenceToRNA'读取缓冲区并将每个字节转换为所需字符串。执行时，该函数返回一个元组，其中包含块号和字符串，例如：（1，'AccgTagata…'），最后，我有一个可用的元组数组

我尝试将函数转换为使用Python的多处理功能

def fileToRNAString(_file):
    rnaSequences = []
    if (_file and os.path.isfile(_file)):
        blockCount = 0
        blockSize = 2048
        printAndLog("!", "Converting %s into RNA string (%d bytes/block)" % (_file, blockSize))
        workers = []
        with open(_file, "rb") as hFile:
            buf = hFile.read(blockSize)
            while buf:
                p = Process(target=decSequenceToRNA, args=(blockCount, buf, rnaSequences))
                p.start()
                workers.append(p)
                blockCount = blockCount + 1
                buf = hFile.read(blockSize)
        for p in workers:
            p.join()
    else:
        printAndLog("-", "Could not find the specified file. Please verify that the file exists:" + _file)
    return rnaSequences

但是，似乎没有进程启动，因为当运行此函数时，将返回一个空数组。在“decSequenceToRNA”中打印到控制台的任何消息均不显示

>>>fileToRNAString(testfile)
[!] Converting /root/src/amino56/M1H2.bin into RNA string (2048 bytes/block).

与此不同的是，我正在运行Linux shiva 3.14-kali1-amd64#1 SMP Debian 3.14.5-1Cali1（2014-06-07）x86_64 GNU/Linux，并使用PyCastle在Python版本2.7.3上测试函数。我正在使用以下软件包：

import os
import re
import sys
import urllib2
import requests
import logging
import hashlib
import argparse
import tempfile
import shutil
import feedparser
from multiprocessing import Process

我想帮助您找出我的代码不起作用的原因，如果我在其他地方遗漏了一些东西，可以使流程正常工作。还可以接受改进代码的建议。以下为“decSequenceToRNA”供参考：

def decSequenceToRNA(_idxSeq, _byteSequence, _rnaSequences):
    rnaSequence = ''
    printAndLog("!", "Processing block %d (%d bytes)" % (_idxSeq, len(_byteSequence)))
    for b in _byteSequence:
        rnaSequence = rnaSequence + base10ToRNA(ord(b))
    printAndLog("+", "Block %d completed. RNA of %d nucleotides generated." % (_idxSeq, len(rnaSequence)))
    _rnaSequences.append((_idxSeq, rnaSequence))

试着写这个（参数列表末尾的逗号）

decSequenceToRNA

在它自己的进程中运行，这意味着它在主进程中获得每个数据结构的独立副本。这意味着当您在

decSequenceToRNA

中附加到

\rnaSequences

时，它对父进程中的

rnaSequences

没有影响。这就解释了为什么会返回一个空列表

你有两个选择来解决这个问题。首先，是使用

multiprocessing.Manager

创建可在进程之间共享的。例如：

import multiprocessing

def f(shared_list):
    shared_list.append(1)

if __name__ == "__main__":
    normal_list = []
    p = multiprocessing.Process(target=f, args=(normal_list,))
    p.start()
    p.join()
    print(normal_list)

    m = multiprocessing.Manager()
    shared_list = m.list()
    p = multiprocessing.Process(target=f, args=(shared_list,))
    p.start()
    p.join()
    print(shared_list)

输出：

[]   # Normal list didn't work, the appended '1' didn't make it to the main process
[1]  # multiprocessing.Manager() list works fine

将此应用于代码只需要替换

rnaSequences = []

与

或者，您可以（也可能应该）使用一个，而不是为每个块创建单独的

进程。我不确定hFile
有多大，也不确定您正在读取的块有多大，但是如果有多个块，您将通过为每个块生成进程来损害性能。使用池
，您可以保持进程计数不变，并轻松创建序列
列表：
def decSequenceToRNA(_idxSeq, _byteSequence):
    rnaSequence = ''
    printAndLog("!", "Processing block %d (%d bytes)" % (_idxSeq, len(_byteSequence)))
    for b in _byteSequence:
        rnaSequence = rnaSequence + base10ToRNA(ord(b))
    printAndLog("+", "Block %d completed. RNA of %d nucleotides generated." % (_idxSeq, len(rnaSequence)))
    return _idxSeq, rnaSequence

def fileToRNAString(_file):
    rnaSequences = []
    if (_file and os.path.isfile(_file)):
        blockCount = 0
        blockSize = 2048
        printAndLog("!", "Converting %s into RNA string (%d bytes/block)" % (_file, blockSize))
        results = []
        p = multiprocessing.Pool()  # Creates a pool of cpu_count() processes
        with open(_file, "rb") as hFile:
            buf = hFile.read(blockSize)
            while buf:
                result = pool.apply_async(decSequenceToRNA, blockCount, buf)
                results.append(result)
                blockCount = blockCount + 1
                buf = hFile.read(blockSize)
        rnaSequences = [r.get() for r in results]
        pool.close()
        pool.join()
    else:
        printAndLog("-", "Could not find the specified file. Please verify that the file exists:" + _file)
    return rnaSequences

请注意，我们不再将rnaSequences
列表传递给子级。相反，我们只需将本应显示的结果返回给父级（这是我们无法使用进程）并在那里构建列表。
您如何执行脚本？它只是来自bash提示符吗？这只对一个元素元组是必需的。是的，我尝试过这个，但没有改变结果。更新：当我从命令行运行脚本时，它似乎起作用了，我可以看到正在运行的进程和管理缓冲区的进程。因此，PyCastle/IDLE接口似乎有问题。@BlackCr0w是的，IDLE不能在多处理中正常工作。您必须直接从CLI运行脚本。感谢您的更新。我注意到每次进程返回时我的列表都是空的，所以你完全正确。谢谢你的信息，我今晚就试试。我尝试使用Queue（），但遇到了一些问题。我怀疑队列已满（我正在创建大量字符串），并创建了一个异常，导致p.join（）挂起。我会让你知道的。再次感谢。
m = multiprocessing.Manager()
rnaSequences = m.list()

def decSequenceToRNA(_idxSeq, _byteSequence):
    rnaSequence = ''
    printAndLog("!", "Processing block %d (%d bytes)" % (_idxSeq, len(_byteSequence)))
    for b in _byteSequence:
        rnaSequence = rnaSequence + base10ToRNA(ord(b))
    printAndLog("+", "Block %d completed. RNA of %d nucleotides generated." % (_idxSeq, len(rnaSequence)))
    return _idxSeq, rnaSequence

def fileToRNAString(_file):
    rnaSequences = []
    if (_file and os.path.isfile(_file)):
        blockCount = 0
        blockSize = 2048
        printAndLog("!", "Converting %s into RNA string (%d bytes/block)" % (_file, blockSize))
        results = []
        p = multiprocessing.Pool()  # Creates a pool of cpu_count() processes
        with open(_file, "rb") as hFile:
            buf = hFile.read(blockSize)
            while buf:
                result = pool.apply_async(decSequenceToRNA, blockCount, buf)
                results.append(result)
                blockCount = blockCount + 1
                buf = hFile.read(blockSize)
        rnaSequences = [r.get() for r in results]
        pool.close()
        pool.join()
    else:
        printAndLog("-", "Could not find the specified file. Please verify that the file exists:" + _file)
    return rnaSequences