Python 连续多处理_Python_Import_Multiprocessing_Os.system

Python 连续多处理

python import

Python 连续多处理,python,import,multiprocessing,os.system,Python,Import,Multiprocessing,Os.system,我正在使用multiprocessing.py过滤巨大的文本文件。代码基本上打开文本文件，处理它，然后关闭它问题是，我希望能够在多个文本文件上连续启动它。因此，我尝试添加一个循环，但由于某些原因，它不起作用（而代码在每个文件上都起作用）。我认为这是一个问题： if __name__ == '__main__': 然而，我在寻找别的东西。我尝试创建一个启动器和一个启动器计数文件，如下所示： LauncherCount.py: def setLauncherCo

我正在使用multiprocessing.py过滤巨大的文本文件。代码基本上打开文本文件，处理它，然后关闭它

问题是，我希望能够在多个文本文件上连续启动它。因此，我尝试添加一个循环，但由于某些原因，它不起作用（而代码在每个文件上都起作用）。我认为这是一个问题：

    if __name__ == '__main__':

然而，我在寻找别的东西。我尝试创建一个启动器和一个启动器计数文件，如下所示：

    LauncherCount.py:

    def setLauncherCount(n):
        global LauncherCount
        LauncherCount = n

以及

我导入

launcheCount.py

，并使用

launcheCount.launcheCount

作为循环索引

当然，这也不起作用，因为它在本地编辑变量

launcheCount.launcheCount

，因此不会在导入的launcheCount版本中编辑它

有没有办法全局编辑导入文件中的变量？或者，有没有其他方法可以做到这一点？我需要的是多次运行代码，更改一个值，并且不使用任何循环

谢谢

编辑：如果需要，这里是我的主要代码。对不起，我的风格不好

import multiprocessing
import config
import time
import LauncherCount

class Filter:

    """ Filtering methods """
    def __init__(self):
        print("launching methods")

        #   Return the list: [Latitude,Longitude]  (elements are floating point numbers)
    def LatLong(self,line):

        comaCount = []
        comaCount.append(line.find(','))
        comaCount.append(line.find(',',comaCount[0] + 1))
    comaCount.append(line.find(',',comaCount[1] + 1))
    Lat = line[comaCount[0] + 1 : comaCount[1]]
    Long = line[comaCount[1] + 1 : comaCount[2]]

    try:
        return [float(Lat) , float(Long)]
    except ValueError:
        return [0,0]

#   Return a boolean:
#   - True if the Lat/Long is within the Lat/Long rectangle defined by:
#           tupleFilter = (minLat,maxLat,minLong,maxLong)
#   - False if not                                                                   
def LatLongFilter(self,LatLongList , tupleFilter) :
    if tupleFilter[0] <= LatLongList[0] <= tupleFilter[1] and
       tupleFilter[2] <= LatLongList[1] <= tupleFilter[3]:
        return True
    else:
        return False

def writeLine(self,key,line):
    filterDico[key][1].write(line)



def filteringProcess(dico):

    myFilter = Filter()

    while True:
        try:
            currentLine = readFile.readline()
        except ValueError:
            break
        if len(currentLine) ==0:                    # Breaks at the end of the file
            break
        if len(currentLine) < 35:                    # Deletes wrong lines (too short)
            continue
        LatLongList = myFilter.LatLong(currentLine)
        for key in dico:
            if myFilter.LatLongFilter(LatLongList,dico[key][0]):
                myFilter.writeLine(key,currentLine)


###########################################################################
                # Main
###########################################################################

# Open read files:
readFile = open(config.readFileList[LauncherCount.LauncherCount][1], 'r')

# Generate writing files:
pathDico = {}
filterDico = config.filterDico

# Create outputs
for key in filterDico:
    output_Name = config.readFileList[LauncherCount.LauncherCount][0][:-4] 
                  + '_' + key +'.log'
    pathDico[output_Name] = config.writingFolder + output_Name
    filterDico[key] = [filterDico[key],open(pathDico[output_Name],'w')]


p = []
CPUCount = multiprocessing.cpu_count()
CPURange = range(CPUCount)

startingTime = time.localtime()

if __name__ == '__main__':
    ### Create and start processes:
    for i in CPURange:
        p.append(multiprocessing.Process(target = filteringProcess , 
                                            args = (filterDico,)))
        p[i].start()

    ### Kill processes:
    while True:
        if [p[i].is_alive() for i in CPURange] == [False for i in CPURange]:
            readFile.close()
            for key in config.filterDico:
                config.filterDico[key][1].close()
                print(key,"is Done!")
                endTime = time.localtime()
            break

    print("Process started at:",startingTime)
    print("And ended at:",endTime)

导入多处理
导入配置
导入时间
导入启动器计数
类别筛选器：
“筛选方法”
定义初始化（自）：
打印（“启动方法”）
#返回列表：[纬度，经度]（元素为浮点数）
def LatLong（自身，线路）：
comaCount=[]
comaCount.append（line.find（'，'））
comaCount.append（行.查找（'，'，comaCount[0]+1））
comaCount.append（line.find（“，”，comaCount[1]+1））
Lat=行[comaCount[0]+1:comaCount[1]]
长=行[comaCount[1]+1:comaCount[2]]
尝试：
返回[浮动（横向），浮动（纵向）]
除值错误外：
返回[0,0]
#返回布尔值：
#-如果Lat/Long在以下定义的Lat/Long矩形内，则为True：
#tupleFilter=（minLat、maxLat、minLong、maxLong）
#-如果不是，则为假
def LatLongFilter（自身、LatLongList、tupleFilter）：
如果tupleFilter[0]要在并行处理组内的文件时按顺序处理文件组：
#!/usr/bin/env python
from multiprocessing import Pool

def work_on(args):
    """Process a single file."""
    i, filename = args
    print("working on %s" % (filename,))
    return i

def files():
    """Generate input filenames to work on."""
    #NOTE: you could read the file list from a file, get it using glob.glob, etc
    yield "inputfile1"
    yield "inputfile2"

def process_files(pool, filenames):
    """Process filenames using pool of processes.

    Wait for results.
    """
    for result in pool.imap_unordered(work_on, enumerate(filenames)):
        #NOTE: in general the files won't be processed in the original order
        print(result) 

def main():
   p = Pool()

   # to do "successive" multiprocessing
   for filenames in [files(), ['other', 'bunch', 'of', 'files']]:
       process_files(p, filenames)

if __name__=="__main__":
   main()

每个process\u file（）
都是在前一个文件完成后按顺序调用的，也就是说，对process\u file（）
的不同调用中的文件不是并行处理的。
“问题是，我希望能够在多个文本文件上连续启动它。”这似乎就是队列的目的。为什么不使用队列呢？如果我知道了，队列用于在进程之间交换值和信息？我想做的不是扩展进程以便它们可以处理连续的文件，而是等待进程完成，并在新的输入文件上使用相同的方法创建一组新的进程。这似乎是倒退。为什么不让一堆进程都在读取队列中等待文件名呢。一个进程完成后，它将文件名放入下一个进程的队列中。这样同步就很容易了。从队列中读取名称；做工作；将名称写入另一个队列。你为什么不这样做？我不会对每个输入文件使用相同的输出文件，所以基本上每次完成一个文件时，我都会关闭所有输出文件，创建一组新文件，并在新输入和新输出上启动一组新进程。。如何使用您的解决方案执行此操作？@user1154967:根据输入文件名生成输出文件名，例如，output\u filename=filename+”。output'这也将创建并行多处理，而我正在寻找一种方法来执行连续多处理，为了确保文件和缓存的安全，输入文件大约为35GB，我从数据库服务器读取这些文件。@user1154967:我已经更新了答案，说明了如何进行“连续”多处理。我需要时间来理解这段代码，但我想就是这样了！我将实现它，并在核心可用时进行测试。目前，我通过创建具有不同读取文件索引的主代码的多个版本来硬编码“成功性”：我使用os.system（“link”）一个接一个地启动它们。除了帮助我解决这个问题外，我认为您还为我寻找的其他问题提供了解决方案：一种不在缓存中保留太多数据的方法。从我读到的关于“收益率”的内容来看，这正是它的目的，因为我只需要浏览一次一个文件？非常感谢。
#!/usr/bin/env python
from multiprocessing import Pool

def work_on(args):
    """Process a single file."""
    i, filename = args
    print("working on %s" % (filename,))
    return i

def files():
    """Generate input filenames to work on."""
    #NOTE: you could read the file list from a file, get it using glob.glob, etc
    yield "inputfile1"
    yield "inputfile2"

def process_files(pool, filenames):
    """Process filenames using pool of processes.

    Wait for results.
    """
    for result in pool.imap_unordered(work_on, enumerate(filenames)):
        #NOTE: in general the files won't be processed in the original order
        print(result) 

def main():
   p = Pool()

   # to do "successive" multiprocessing
   for filenames in [files(), ['other', 'bunch', 'of', 'files']]:
       process_files(p, filenames)

if __name__=="__main__":
   main()