Python 搜索文件夹并删除列表中存在的文件的最佳方法?

Python 搜索文件夹并删除列表中存在的文件的最佳方法?,python,python-2.7,list,python-3.x,Python,Python 2.7,List,Python 3.x,我创建了一个列表,其中包含要删除的文件的文件路径。在文件夹及其子文件夹中搜索这些文件,然后删除它们,最适合的方式是什么 目前,我正在遍历文件路径列表,然后遍历一个目录,并将目录中的文件与列表中的文件进行比较。一定有更好的办法 for x in features_to_delete: name_checker = str(x) + '.jpg' print 'this is name checker {}'.format(name_checker) for root,

我创建了一个列表,其中包含要删除的文件的文件路径。在文件夹及其子文件夹中搜索这些文件,然后删除它们,最适合的方式是什么

目前,我正在遍历文件路径列表,然后遍历一个目录,并将目录中的文件与列表中的文件进行比较。一定有更好的办法

for x in features_to_delete:

    name_checker = str(x) + '.jpg'
    print 'this is name checker {}'.format(name_checker)

    for root, dir2, files in os.walk(folder):
        print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)

        for b in files:
            if b.endswith('.jpg'):
                local_folder = os.path.join(folder, root)
                print 'Here is name of file {}'.format(b)
                print 'Here is name of name checker {}'.format(name_checker)

                if b == name_checker:
                    counter += 1
                    print '{} needs to be deleted..'.format(b)
                    #os.remove(os.path.join(local_folder, b))
                    print 'Removed {} \n'.format(os.path.join(day_folder, b))

                else:
                    print 'This file can stay {} \n'.format(b)
            else:
                pass

为了澄清,我现在所做的是遍历整个要删除的功能列表,每次迭代我也遍历目录和所有子目录中的每个文件,并将该文件与当前在要删除的功能列表中循环的文件进行比较。这需要很长时间,而且似乎是一种可怕的方式去做

每个目录只能访问一次。可以使用集合将给定目录中的文件名列表与删除列表进行比较。包含和不包含文件的列表变成简单的一步操作。如果您不关心打印文件名,那么它相当紧凑:

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    for delete_name in delete_set.intersection(files):
        os.remove(os.path.join(root, delete_name))
但是如果你想边打印边打印,你必须添加一些中间变量

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    files = set(files)
    delete_these = delete_set & files
    keep_these = files - delete_set
    print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)
    print 'delete these: {}'.format('\n '.join(delete_these))
    print 'keep these: {}'.format('\n '.join(keep_these))
    for delete_name in delete_these:
        os.remove(os.path.join(root, delete_name))

您应该只访问每个目录一次。可以使用集合将给定目录中的文件名列表与删除列表进行比较。包含和不包含文件的列表变成简单的一步操作。如果您不关心打印文件名,那么它相当紧凑:

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    for delete_name in delete_set.intersection(files):
        os.remove(os.path.join(root, delete_name))
但是如果你想边打印边打印,你必须添加一些中间变量

delete_set = set(str(x) + '.jpg' for x in features_to_delete)
for root, dirs, files in os.walk(folder):
    files = set(files)
    delete_these = delete_set & files
    keep_these = files - delete_set
    print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)
    print 'delete these: {}'.format('\n '.join(delete_these))
    print 'keep these: {}'.format('\n '.join(keep_these))
    for delete_name in delete_these:
        os.remove(os.path.join(root, delete_name))

创建一个函数,将递归glob类功能与您自己的删除逻辑分离。然后只需迭代列表并删除任何与黑名单匹配的内容

您可以设置
以提高与文件名匹配的性能。列表越大,改进就越大,但对于较小的列表,改进可能可以忽略不计

from fnmatch import fnmatch
import os
from os import path

def globber(rootpath, wildcard):
    for root, dirs, files in os.walk(rootpath):
        for file in files:
            if fnmatch(file, wildcard):
                yield path.join(root, file)

features_to_delete = ['blah', 'oh', 'xyz']

todelete = {'%s.jpg' % x for x in features_to_delete}

print(todelete)
for f in globber('/home/prooney', "*.jpg"):
    if f in todelete:
        print('deleting file: %s' % f)
        os.remove(f)

创建一个函数,将递归glob类功能与您自己的删除逻辑分离。然后只需迭代列表并删除任何与黑名单匹配的内容

您可以设置
以提高与文件名匹配的性能。列表越大,改进就越大,但对于较小的列表,改进可能可以忽略不计

from fnmatch import fnmatch
import os
from os import path

def globber(rootpath, wildcard):
    for root, dirs, files in os.walk(rootpath):
        for file in files:
            if fnmatch(file, wildcard):
                yield path.join(root, file)

features_to_delete = ['blah', 'oh', 'xyz']

todelete = {'%s.jpg' % x for x in features_to_delete}

print(todelete)
for f in globber('/home/prooney', "*.jpg"):
    if f in todelete:
        print('deleting file: %s' % f)
        os.remove(f)

请查看此代码是否对您有帮助。我包括一个计时器,比较两种不同方法的时间

import os
from timeit import default_timer as timer

features_to_delete = ['a','b','c']
start = timer()
for x in features_to_delete:

    name_checker = str(x) + '.jpg'
    print 'this is name checker {}'.format(name_checker)
    folder = '.'
    for root, dir2, files in os.walk(folder):
        print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)

        for b in files:
            if b.endswith('.jpg'):
                local_folder = os.path.join(folder, root)
                print 'Here is name of file {}'.format(b)
                print 'Here is name of name checker {}'.format(name_checker)
                counter = 0
                if b == name_checker:
                    counter += 1
                    print '{} needs to be deleted..'.format(b)
                    os.remove(os.path.join(local_folder, b))
                    print 'Removed {} \n'.format(os.path.join(local_folder, b))

                else:
                    print 'This file can stay {} \n'.format(b)
            else:
                pass

end = timer()
print(end - start)

start = timer()
features_to_delete = ['d','e','f']
matches = []
folder = '.'
for x in features_to_delete:
    x = str(x) + '.jpg'
features_to_delete = [e + '.jpg' for e in features_to_delete]
print 'features' + str(features_to_delete)
for root, dirnames, filenames in os.walk(folder):
    for filename in set(filenames).intersection(features_to_delete):#fnmatch.filter(filenames, features_to_delete)# fnmatch.filter(filenames, features_to_delete):
        local_folder = os.path.join(folder, root)
        os.remove(os.path.join(local_folder, filename))
        print 'Removed {} \n'.format(os.path.join(local_folder, filename))
end = timer()
print(end - start)
试验


请查看此代码是否对您有帮助。我包括一个计时器,比较两种不同方法的时间

import os
from timeit import default_timer as timer

features_to_delete = ['a','b','c']
start = timer()
for x in features_to_delete:

    name_checker = str(x) + '.jpg'
    print 'this is name checker {}'.format(name_checker)
    folder = '.'
    for root, dir2, files in os.walk(folder):
        print 'This is the root directory at the moment:{} The following are files inside of it'.format(root)

        for b in files:
            if b.endswith('.jpg'):
                local_folder = os.path.join(folder, root)
                print 'Here is name of file {}'.format(b)
                print 'Here is name of name checker {}'.format(name_checker)
                counter = 0
                if b == name_checker:
                    counter += 1
                    print '{} needs to be deleted..'.format(b)
                    os.remove(os.path.join(local_folder, b))
                    print 'Removed {} \n'.format(os.path.join(local_folder, b))

                else:
                    print 'This file can stay {} \n'.format(b)
            else:
                pass

end = timer()
print(end - start)

start = timer()
features_to_delete = ['d','e','f']
matches = []
folder = '.'
for x in features_to_delete:
    x = str(x) + '.jpg'
features_to_delete = [e + '.jpg' for e in features_to_delete]
print 'features' + str(features_to_delete)
for root, dirnames, filenames in os.walk(folder):
    for filename in set(filenames).intersection(features_to_delete):#fnmatch.filter(filenames, features_to_delete)# fnmatch.filter(filenames, features_to_delete):
        local_folder = os.path.join(folder, root)
        os.remove(os.path.join(local_folder, filename))
        print 'Removed {} \n'.format(os.path.join(local_folder, filename))
end = timer()
print(end - start)
试验


不幸的是,我实际上使用的是2.7。我将它用于一些只支持2.7的GIS函数。他的链接是针对python 2的?我看不出这个问题。从PY3.5开始,glob获得了递归支持,这将简化此代码。看见对于py 2,它永远不会与OP已经发布的内容有根本的不同。我对一个“功能”感到困惑,它是指向“C:\home\”这样的目录的路径吗?如何删除文件名?它像“C:\home*.jpg”吗?既然您显示的代码中没有设置“folder”,那么它是什么呢?很遗憾,我实际上使用的是2.7。我将它用于一些只支持2.7的GIS函数。他的链接是针对python 2的?我看不出这个问题。从PY3.5开始,glob获得了递归支持,这将简化此代码。看见对于py 2,它永远不会与OP已经发布的内容有根本的不同。我对一个“功能”感到困惑,它是指向“C:\home\”这样的目录的路径吗?如何删除文件名?它像“C:\home*.jpg”吗?既然您显示的代码中没有设置“folder”,那么它是什么?