Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/311.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python脚本将目录中的所有文件连接到一个文件中_Python_File_Copy - Fatal编程技术网

python脚本将目录中的所有文件连接到一个文件中

python脚本将目录中的所有文件连接到一个文件中,python,file,copy,Python,File,Copy,我编写了以下脚本,将目录中的所有文件连接到一个文件中 这能在以下方面得到优化吗 惯用python 时间 以下是片段: import time, glob outfilename = 'all_' + str((int(time.time()))) + ".txt" filenames = glob.glob('*.txt') with open(outfilename, 'wb') as outfile: for fname in filenames: with o

我编写了以下脚本,将目录中的所有文件连接到一个文件中

这能在以下方面得到优化吗

  • 惯用python

  • 时间

  • 以下是片段:

    import time, glob
    
    outfilename = 'all_' + str((int(time.time()))) + ".txt"
    
    filenames = glob.glob('*.txt')
    
    with open(outfilename, 'wb') as outfile:
        for fname in filenames:
            with open(fname, 'r') as readfile:
                infile = readfile.read()
                for line in infile:
                    outfile.write(line)
                outfile.write("\n\n")
    

    您可以直接迭代文件对象的行,而无需将整个内容读入内存:

    with open(fname, 'r') as readfile:
        for line in readfile:
            outfile.write(line)
    
    用于复制数据:

    import shutil
    
    with open(outfilename, 'wb') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'rb') as readfile:
                shutil.copyfileobj(readfile, outfile)
    
    shutil
    以块的形式读取
    readfile
    对象,并将它们直接写入
    outfile
    fileobject。不要使用
    readline()
    或迭代缓冲区,因为您不需要查找行尾的开销


    使用相同的模式进行阅读和写作;这在使用Python3时尤为重要;我在这里使用了二进制模式

    不需要使用那么多变量

    with open(outfilename, 'w') as outfile:
        for fname in filenames:
            with open(fname, 'r') as readfile:
                outfile.write(readfile.read() + "\n\n")
    
    该模块提供了一种自然的方式来迭代多个文件

    for line in fileinput.input(glob.glob("*.txt")):
        outfile.write(line)
    

    使用Python2.7,我对

    outfile.write(infile.read())
    
    vs

    我迭代了20个.txt文件,大小从63MB到313MB不等,联合文件大小约为2.6GB。在这两种方法中,正常读取模式的性能都优于二进制读取模式,而shutil.copyfileobj通常比outfile.write快

    当比较最差的组合(outfile.write,二进制模式)和最佳组合(shutil.copyfileobj,正常读取模式)时,差异非常显著:

    outfile.write, binary mode: 43 seconds, on average.
    
    shutil.copyfileobj, normal mode: 27 seconds, on average.
    

    输出文件在正常读取模式下的最终大小为2620 MB,而在二进制读取模式下的最终大小为2578 MB。

    我很想进一步检查性能,我使用了Martijn Pieters和Stephen Miller的答案

    我用
    shutil
    和不使用
    shutil
    尝试了二进制和文本模式。我试图合并270个文件

    文本模式-

    def using_shutil_text(outfilename):
        with open(outfilename, 'w') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'r') as readfile:
                    shutil.copyfileobj(readfile, outfile)
    
    def without_shutil_text(outfilename):
        with open(outfilename, 'w') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'r') as readfile:
                    outfile.write(readfile.read())
    
    Shutil - 20.47757601737976
    Normal - 13.718038082122803
    
    二进制模式-

    def using_shutil_text(outfilename):
        with open(outfilename, 'wb') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'rb') as readfile:
                    shutil.copyfileobj(readfile, outfile)
    
    def without_shutil_text(outfilename):
        with open(outfilename, 'wb') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'rb') as readfile:
                    outfile.write(readfile.read())
    
    Shutil - 20.161773920059204
    Normal - 17.327500820159912
    
    二进制模式的运行时间-

    def using_shutil_text(outfilename):
        with open(outfilename, 'wb') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'rb') as readfile:
                    shutil.copyfileobj(readfile, outfile)
    
    def without_shutil_text(outfilename):
        with open(outfilename, 'wb') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'rb') as readfile:
                    outfile.write(readfile.read())
    
    Shutil - 20.161773920059204
    Normal - 17.327500820159912
    
    文本模式的运行时间-

    def using_shutil_text(outfilename):
        with open(outfilename, 'w') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'r') as readfile:
                    shutil.copyfileobj(readfile, outfile)
    
    def without_shutil_text(outfilename):
        with open(outfilename, 'w') as outfile:
            for filename in glob.glob('*.txt'):
                if filename == outfilename:
                    # don't want to copy the output into the output
                    continue
                with open(filename, 'r') as readfile:
                    outfile.write(readfile.read())
    
    Shutil - 20.47757601737976
    Normal - 13.718038082122803
    
    看起来在这两种模式下,shutil的性能相同,而文本模式比二进制模式快


    操作系统:MacOS10.14Mojave。Macbook Air 2017

    时间优化?使用“cat*.txt>all.txt”:(如果它不局限于一次读取一行,那么它可能的副本会更好。@Marcin,这是正确的。我曾经认为这是一个很酷的解决方案——直到我看到Martijn Pieter的
    shutil.copyfileobj
    humdinger.interest。那是什么平台?我大致在两个平台上工作:Linux Fedora 16、不同节点或Windows 7 Enterprise SP1,采用Intel Core(TM)2四CPU Q9550、2.83 GHz。我认为是后者。为什么使用相同的模式进行写入和读取很重要?@JuanDavid:因为shutil将对一个文件对象使用
    .read()
    调用,
    .write()
    调用另一个文件对象,将读取的数据从一个文件对象传递到另一个文件对象。如果一个以二进制模式打开,另一个以文本模式打开,则表示您正在传递不兼容的数据(二进制数据到文本文件,或文本数据到二进制文件)。此处的代码不适用于CSV文件,dang。但它确实给了我一些很好的灵感,告诉我如何使用CSV来实现这一点。我对Python比较陌生。@bretts:文件的内容应该不重要;可能您的CSV文件缺少最后一个换行符分隔符,或者使用了不同的分隔符格式?