Python 为什么os.scandir（）和os.listdir（）一样慢？_Python_Windows_Filesystems_Scandir_Listdir

Python 为什么os.scandir（）和os.listdir（）一样慢？

python windows filesystems

Python 为什么os.scandir（）和os.listdir（）一样慢？,python,windows,filesystems,scandir,listdir,Python,Windows,Filesystems,Scandir,Listdir,我尝试在Windows上使用os.scandir（）而不是os.listdir（）来优化用Python编写的文件浏览函数。然而，时间保持不变，大约2分钟半，我不知道为什么。以下是原始和修改后的功能： os.listdir（）版本： os.scandir（）版本：此外，以下是本手册中使用的辅助功能： def git_ignore(self, filepath): if '.git' in filepath: return True if '.ci' in fil

我尝试在Windows上使用os.scandir（）而不是os.listdir（）来优化用Python编写的文件浏览函数。然而，时间保持不变，大约2分钟半，我不知道为什么。以下是原始和修改后的功能：

os.listdir（）版本：

os.scandir（）版本：

此外，以下是本手册中使用的辅助功能：

def git_ignore(self, filepath):
    if '.git' in filepath:
        return True
    if '.ci' in filepath:
        return True
    if '.delivery' in filepath:
        return True
    child = subprocess.Popen(['git', 'check-ignore', str(filepath)],
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    output = child.communicate()[0]
    status = child.wait()
    return status == 0

============================================================

class Folder(dict):
    def __init__(self, path):
        self.path = path
        self.categories = {}

============================================================

class File(object):
    def __init__(self, path):
        self.path = path
        self.filename, self.extension = os.path.splitext(self.path)

有没有人能解决我如何让函数运行得更快的问题？我的假设是，在开始时提取名称和路径会使其运行速度比应该的慢，对吗？

关于您的问题： os.walk调用统计信息的次数似乎超过了必要的次数。这似乎就是它比os.scandir（）慢的原因

在这种情况下，我认为提高速度性能的最佳方法是使用并行处理，这可以在某些循环中难以置信地提高速度。关于这个问题有很多帖子。这里有一个：

不过，我想与大家分享一些想法。我还一直想知道这三个选项（scandir、listdir、walk）的最佳用法是什么。关于性能比较的文档并不多。也许最好的方法是像你那样自己测试它。以下是我的结论：

os.listdir（）的用法：与os.scandir（）相比，它似乎没有什么优势，只是更容易理解。当我只需要列出目录中的文件时，我仍然使用它

优点：

快速简单

缺点：

太简单了，只适用于列出目录中的文件和目录，所以您可能需要将其与其他方法结合起来，以获得有关文件元数据的额外功能。如果是这样，最好使用os.scandir（）

os.walk（）的用法：当我们需要获取目录（和子目录）中的所有项时，这是最常用的函数

优点：

这可能是绕过所有项目、路径和名称的最简单方法

缺点：

它似乎调用统计数据的次数超过了必要的次数。这似乎就是它比os.scandir（）慢的原因
尽管它提供了文件的根部分，但它没有提供os.scandir（）的额外元信息

os.scandir（）的用法：它似乎（几乎）兼收并蓄。它为您提供了简单的os.listdir的速度，并提供了允许您简化循环，因为可以避免使用exiftool或其他元数据工具当您需要有关文件的额外信息时

优点：

快。与os.listdir（）速度相同
非常好的额外功能

缺点：

如果您想深入研究子文件，您需要在orther中创建另一个函数来扫描每个子文件。这个函数非常简单，但在本例中使用os.walk可能会更具Python风格（我只是指更优雅的sintax）

这就是我在阅读和使用它们之后的观点。我很高兴被更正，这样我就可以了解更多信息。

对于每个不包含“.git”、“.ci”或“.delivery”的路径，您正在生成一个git子进程。这是非常昂贵的，如果你有很多这样的路径，那么花在生成和等待git进程上的累积时间将是一个瓶颈对于全面和组织良好的答案。我的理解清楚多了。

def browse(self, path, tree):
    # for each entry in the path
    for dirEntry in os.scandir(path):
        entry_path = dirEntry.name
        entity_path = dirEntry.path
        # check if support by git or not
        if self.git_ignore(entity_path) is False:
            # if is a dir create a new level in the tree
            if dirEntry.is_dir(follow_symlinks=True):
                tree[entry_path] = Folder(entity_path)
                self.browse(entity_path, tree[entry_path])
            # if is a file add it to the tree
            if dirEntry.is_file(follow_symlinks=True):
                tree[entry_path] = File(entity_path)

def git_ignore(self, filepath):
    if '.git' in filepath:
        return True
    if '.ci' in filepath:
        return True
    if '.delivery' in filepath:
        return True
    child = subprocess.Popen(['git', 'check-ignore', str(filepath)],
                         stdout=subprocess.PIPE,
                         stderr=subprocess.PIPE)
    output = child.communicate()[0]
    status = child.wait()
    return status == 0

============================================================

class Folder(dict):
    def __init__(self, path):
        self.path = path
        self.categories = {}

============================================================

class File(object):
    def __init__(self, path):
        self.path = path
        self.filename, self.extension = os.path.splitext(self.path)