Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/335.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 生成多个进程来编写不同的文件_Python - Fatal编程技术网

Python 生成多个进程来编写不同的文件

Python 生成多个进程来编写不同的文件,python,Python,其思想是使用N进程编写N文件 要写入的文件的数据来自多个文件,这些文件存储在字典中,字典中有一个列表作为值,如下所示: dic = {'file1':['data11.txt', 'data12.txt', ..., 'data1M.txt'], 'file2':['data21.txt', 'data22.txt', ..., 'data2M.txt'], ... 'fileN':['dataN1.txt', 'dataN2.txt', ...,

其思想是使用
N
进程编写
N
文件

要写入的文件的数据来自多个文件,这些文件存储在字典中,字典中有一个列表作为值,如下所示:

dic = {'file1':['data11.txt', 'data12.txt', ..., 'data1M.txt'],
       'file2':['data21.txt', 'data22.txt', ..., 'data2M.txt'], 
        ...
       'fileN':['dataN1.txt', 'dataN2.txt', ..., 'dataNM.txt']}
jobs = []
for d in dic:
    outfile = str(d)+"_merged.txt"
    with open(outfile, 'w') as out:
        p = multiprocessing.Process(target = merger.merger, args=(dic[d], name, out))
        jobs.append(p)
        p.start()
        out.close()
def merger(files, name, outfile):
    time.sleep(2)
    sys.stdout.write("Merging %n...\n" % name)

    # the reason for this step is that all the different files have a header
    # but I only need the header from the first file.
    with open(files[0], 'r') as infile:
        for line in infile:
            print "writing to outfile: ", name, line
            outfile.write(line) 
    for f in files[1:]:
        with open(f, 'r') as infile:
            next(infile) # skip first line
            for line in infile:
                outfile.write(line)
    sys.stdout.write("Done with: %s\n" % name)
所以
file1
data11+data12+…+数据1M

因此,我的代码如下所示:

dic = {'file1':['data11.txt', 'data12.txt', ..., 'data1M.txt'],
       'file2':['data21.txt', 'data22.txt', ..., 'data2M.txt'], 
        ...
       'fileN':['dataN1.txt', 'dataN2.txt', ..., 'dataNM.txt']}
jobs = []
for d in dic:
    outfile = str(d)+"_merged.txt"
    with open(outfile, 'w') as out:
        p = multiprocessing.Process(target = merger.merger, args=(dic[d], name, out))
        jobs.append(p)
        p.start()
        out.close()
def merger(files, name, outfile):
    time.sleep(2)
    sys.stdout.write("Merging %n...\n" % name)

    # the reason for this step is that all the different files have a header
    # but I only need the header from the first file.
    with open(files[0], 'r') as infile:
        for line in infile:
            print "writing to outfile: ", name, line
            outfile.write(line) 
    for f in files[1:]:
        with open(f, 'r') as infile:
            next(infile) # skip first line
            for line in infile:
                outfile.write(line)
    sys.stdout.write("Done with: %s\n" % name)
merge.py看起来像这样:

dic = {'file1':['data11.txt', 'data12.txt', ..., 'data1M.txt'],
       'file2':['data21.txt', 'data22.txt', ..., 'data2M.txt'], 
        ...
       'fileN':['dataN1.txt', 'dataN2.txt', ..., 'dataNM.txt']}
jobs = []
for d in dic:
    outfile = str(d)+"_merged.txt"
    with open(outfile, 'w') as out:
        p = multiprocessing.Process(target = merger.merger, args=(dic[d], name, out))
        jobs.append(p)
        p.start()
        out.close()
def merger(files, name, outfile):
    time.sleep(2)
    sys.stdout.write("Merging %n...\n" % name)

    # the reason for this step is that all the different files have a header
    # but I only need the header from the first file.
    with open(files[0], 'r') as infile:
        for line in infile:
            print "writing to outfile: ", name, line
            outfile.write(line) 
    for f in files[1:]:
        with open(f, 'r') as infile:
            next(infile) # skip first line
            for line in infile:
                outfile.write(line)
    sys.stdout.write("Done with: %s\n" % name)
我确实看到文件写在它应该去的文件夹上,但它是空的。没有标题,什么都没有。我在里面放了指纹,看看是否一切都是正确的,但没有任何效果


救命啊

由于辅助进程与创建它们的主进程并行运行,因此名为
out
的文件在辅助进程可以写入它们之前被关闭。即使删除
out.close()
,由于
with
语句,也会发生这种情况。而是将文件名传递给每个进程,让进程打开和关闭文件。

问题是您没有关闭子进程中的文件,因此内部缓冲数据丢失。您可以将文件移动到打开给孩子的位置,或者将整个文件包装在try/finally块中,以确保文件关闭。在父级中打开的一个潜在优势是,您可以在父级中处理文件错误。我不是说它很有吸引力,只是一种选择

def merger(files, name, outfile):
    try:
        time.sleep(2)
        sys.stdout.write("Merging %n...\n" % name)

        # the reason for this step is that all the different files have a header
        # but I only need the header from the first file.
        with open(files[0], 'r') as infile:
            for line in infile:
                print "writing to outfile: ", name, line
                outfile.write(line) 
        for f in files[1:]:
            with open(f, 'r') as infile:
                next(infile) # skip first line
                for line in infile:
                    outfile.write(line)
        sys.stdout.write("Done with: %s\n" % name)
    finally:
        outfile.close()
import multiprocessing as mp
import os
import time

if os.path.exists('mytestfile.txt'):
    os.remove('mytestfile.txt')

def worker(f, do_close=False):
    time.sleep(2)
    print('writing')
    f.write("this is data")
    if do_close:
        print("closing")
        f.close()


print('without close')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, False))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())

print('with close')
os.remove('mytestfile.txt')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, True))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())
更新

关于父/子文件描述符以及子文件中的文件发生了什么,存在一些混淆。如果程序退出时文件仍处于打开状态,则底层C库不会将数据刷新到磁盘。理论上说,一个正常运行的程序在退出之前会关闭一些东西。下面是一个示例,其中子级由于未关闭文件而丢失数据

import multiprocessing as mp
import os
import time

if os.path.exists('mytestfile.txt'):
    os.remove('mytestfile.txt')

def worker(f, do_close=False):
    time.sleep(2)
    print('writing')
    f.write("this is data")
    if do_close:
        print("closing")
        f.close()


print('without close')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, False))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())

print('with close')
os.remove('mytestfile.txt')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, True))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())
我在linux上运行它,得到

without close
writing
file data: 
with close
writing
closing
file data: this is data

p.start()
之后立即调用
out.close()
。我怀疑合并任务在文件从下面关闭之前是否有时间执行。@Blorgbeard很好,但仍然没有什么…这是在类似linux的操作系统上,对吗?@Blorgbeard关闭父级中的只读文件不会影响子级中的文件。如果有要刷新的写入数据,这将是一个问题,但这里不是这样。@t请确保在父级中打开/关闭的文件是一个写访问文件。我说的是
open(outfile,'w')
out.close()
@Pavlos不,保持相同数量的进程,但只传递文件名而不是文件对象。但是关闭父级中的文件对子级来说应该不是问题。我不知道这是怎么解决的@tdelaney,因为父级在子级有机会写入文件之前关闭了该文件,而一旦文件关闭,您就无法写入该文件。不,它不是这样工作的。子项是用文件描述符的独立副本生成的。父级可以关闭其版本,但这对子级没有影响。这里真正发生的是OP没有关闭子文件中的文件,因此它的未写入数据被丢弃。当OP更改为在子对象中打开文件时,他也更改为在子对象中关闭文件。这才是真正解决问题的原因。@tdelaney我想你是对的,我忘了进程有单独的副本。下面是我在Windows(python 2和3)上得到的结果:-tldr:错误。这不是意外的。Windows尝试重新打开该文件,但该文件未打开以供共享。没错。。。。完全不同。