如何使用python以编程方式计算存档中的文件数_Python_Python 2.7_Subprocess_Popen_7zip

如何使用python以编程方式计算存档中的文件数

python python-2.7

如何使用python以编程方式计算存档中的文件数,python,python-2.7,subprocess,popen,7zip,Python,Python 2.7,Subprocess,Popen,7zip,在我维护的程序中，按照以下步骤执行： # count the files in the archive length = 0 command = ur'"%s" l -slt "%s"' % (u'path/to/7z.exe', srcFile) ins, err = Popen(command, stdout=PIPE, stdin=PIPE, startupinfo=startupinfo).communicate() ins = StringIO.Str

在我维护的程序中，按照以下步骤执行：

# count the files in the archive
length = 0
command = ur'"%s" l -slt "%s"' % (u'path/to/7z.exe', srcFile)
ins, err = Popen(command, stdout=PIPE, stdin=PIPE,
                 startupinfo=startupinfo).communicate()
ins = StringIO.StringIO(ins)
for line in ins: length += 1
ins.close()

这真的是唯一的办法吗？我似乎找不到，但似乎有点奇怪，我不能只问文件的数量

那么错误检查呢？将其修改为：

proc = Popen(command, stdout=PIPE, stdin=PIPE,
             startupinfo=startupinfo)
out = proc.stdout
# ... count
returncode = proc.wait()
if returncode:
    raise Exception(u'Failed reading number of files from ' + srcFile)

或者我应该解析Popen的输出吗

编辑：对7z、rar、zip存档（7z.exe支持这些存档）感兴趣-但是7z和zip对于初学者来说足够了，可以用Python计算zip存档中的存档成员数：

#!/usr/bin/env python
import sys
from contextlib import closing
from zipfile import ZipFile

with closing(ZipFile(sys.argv[1])) as archive:
    count = len(archive.infolist())
print(count)

它可以使用

zlib

，

bz2

，

lzma

模块（如果可用）来解压缩归档文件

要计算tar归档中的常规文件数，请执行以下操作：

#!/usr/bin/env python
import sys
import tarfile

with tarfile.open(sys.argv[1]) as archive:
    count = sum(1 for member in archive if member.isreg())
print(count)

它可能支持

gzip

、

bz2

和

lzma

压缩，具体取决于Python的版本

您可以找到第三方模块，该模块将为7z存档提供类似的功能

要使用

7z

实用程序获取存档中的文件数，请执行以下操作：

import os
import subprocess

def count_files_7z(archive):
    s = subprocess.check_output(["7z", "l", archive], env=dict(os.environ, LC_ALL="C"))
    return int(re.search(br'(\d+)\s+files,\s+\d+\s+folders$', s).group(1))

如果存档中有许多文件，则以下版本可能会占用较少的内存：

import os
import re
from subprocess import Popen, PIPE, CalledProcessError

def count_files_7z(archive):
    command = ["7z", "l", archive]
    p = Popen(command, stdout=PIPE, bufsize=1, env=dict(os.environ, LC_ALL="C"))
    with p.stdout:
        for line in p.stdout:
            if line.startswith(b'Error:'): # found error
                error = line + b"".join(p.stdout)
                raise CalledProcessError(p.wait(), command, error)
    returncode = p.wait()
    assert returncode == 0
    return int(re.search(br'(\d+)\s+files,\s+\d+\s+folders', line).group(1))

例如：

import sys

try:
    print(count_files_7z(sys.argv[1]))
except CalledProcessError as e:
    getattr(sys.stderr, 'buffer', sys.stderr).write(e.output)
    sys.exit(e.returncode)

要计算泛型子流程输出中的行数，请执行以下操作：

from functools import partial
from subprocess import Popen, PIPE, CalledProcessError

p = Popen(command, stdout=PIPE, bufsize=-1)
with p.stdout:
    read_chunk = partial(p.stdout.read, 1 << 15)
    count = sum(chunk.count(b'\n') for chunk in iter(read_chunk, b''))
if p.wait() != 0:
    raise CalledProcessError(p.returncode, command)
print(count)

从functools导入部分
从子流程导入Popen、PIPE，调用流程错误
p=Popen（命令，stdout=PIPE，bufsize=-1）
使用p.stdout：
read_chunk=partial（p.stdout.read，1因为我已经将7z.exe捆绑在应用程序中，我当然想避免使用第三方库，但我确实需要解析rar和7z归档文件，我想我会选择：
regErrMatch = re.compile(u'Error:', re.U).match # needs more testing
r"""7z list command output is of the form:
   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2015-06-29 21:14:04 ....A       <size>               <filename>
where ....A is the attribute value for normal files, ....D for directories
"""
reFileMatch = re.compile(ur'(\d|:|-|\s)*\.\.\.\.A', re.U).match

def countFilesInArchive(srcArch, listFilePath=None):
    """Count all regular files in srcArch (or only the subset in
    listFilePath)."""
    # https://stackoverflow.com/q/31124670/281545
    command = ur'"%s" l -scsUTF-8 -sccUTF-8 "%s"' % ('compiled/7z.exe', srcArch)
    if listFilePath: command += u' @"%s"' % listFilePath
    proc = Popen(command, stdout=PIPE, startupinfo=startupinfo, bufsize=-1)
    length, errorLine = 0, []
    with proc.stdout as out:
        for line in iter(out.readline, b''):
            line = unicode(line, 'utf8')
            if errorLine or regErrMatch(line):
                errorLine.append(line)
            elif reFileMatch(line):
                length += 1
    returncode = proc.wait()
    if returncode or errorLine: raise StateError(u'%s: Listing failed\n' + 
        srcArch + u'7z.exe return value: ' + str(returncode) +
        u'\n' + u'\n'.join([x.strip() for x in errorLine if x.strip()]))
    return length

将使用我的发现编辑此文件。
您应该支持哪种类型的存档？对于zip、tar check和@LoïcFaure Lacroix：谢谢-编辑。我肯定需要7z…也许可以查看此文件？py7zlib应该能够读取存档文件。之后，您可以使用类似于zipfile或tarfile的内容来提取其中的名称（py7zlib.Archive7z.getnames）。嘿，谢谢！你能解释一下为什么buffsize=-1（与上一个答案中的buffsize=1相反：）-以及read\u chunk=partial是什么吗（p.stdout.read，1@Mr_and_Mrs_D：您可能应该作为一个单独的问题询问7z.exe
中的错误处理：包括以下内容：7z
是否提供一组退出代码来指示各种错误，例如，7z
是否将其错误消息打印到stderr，或者是否将其与存档成员li混合在一起stdout中的st？在我有时间的时候就可以了，一定要提到你-谢谢：）-退出代码：@Mr_and_Mrs_D:所有代码都应该按原样工作，也就是说，没有-scsUTF-8-sccUTF-8
是必需的。注意：检查输出（）
-基于的版本可能比计数文件\u 7z（）
使用的内存更多
但是错误处理是相同的——您可以使用两种count_files_7z（）
实现运行该示例——尽管第二种变体在遇到错误之前不会存储输出（这就是它使用更少内存的原因）。@Mr_和\u Mrs_D：否则，您可能会获得另一种语言的消息（取决于您的区域设置）而使用英文单词“files”、“folders”的正则表达式可能会失败。您可以在这里使用进行换行操作：
，或者更好地使用进行换行操作。TextIOWrapper（out，encoding='utf-8'）：
（将字节解码为Unicode并启用通用换行符模式）。不要使用如果len（container）
，而是使用如果container
（空容器在Python中为False）。line.startswith（'Error:'）
可以用来代替regErrMatch
regex。您确定7z将其错误打印到stdout（这很不幸）？请，.Yes 7z将其输出打印到stdout（…）-TextIOWrapper我会看一看。regErrMatch：我可能需要详细说明错误的正则表达式。PEP8-它是遗留代码，慢慢地对其进行PEP8'处理（另请参见：-虽然有79个字符，但我完全同意）
def countFilesInArchive(srcArch, listFilePath=None):
    """Count all regular files in srcArch (or only the subset in
    listFilePath)."""
    command = [exe7z, u'l', u'-scsUTF-8', u'-sccUTF-8', srcArch]
    if listFilePath: command += [u'@%s' % listFilePath]
    proc = Popen(command, stdout=PIPE, stdin=PIPE, # stdin needed if listFilePath
                 startupinfo=startupinfo, bufsize=1)
    errorLine = line = u''
    with proc.stdout as out:
        for line in iter(out.readline, b''): # consider io.TextIOWrapper
            line = unicode(line, 'utf8')
            if regErrMatch(line):
                errorLine = line + u''.join(out)
                break
    returncode = proc.wait()
    msg = u'%s: Listing failed\n' % srcArch.s
    if returncode or errorLine:
        msg += u'7z.exe return value: ' + str(returncode) + u'\n' + errorLine
    elif not line: # should not happen
        msg += u'Empty output'
    else: msg = u''
    if msg: raise StateError(msg) # consider using CalledProcessError
    # number of files is reported in the last line - example:
    #                                3534900       325332  75 files, 29 folders
    return int(re.search(ur'(\d+)\s+files,\s+\d+\s+folders', line).group(1))