Python 仅比较文件/文件夹名称上的目录,是否打印任何差异?
如何递归比较两个目录(比较应仅基于文件名)并仅打印一个或另一个目录中的文件/文件夹 我正在使用Python 3.3 我已经看到了Python 仅比较文件/文件夹名称上的目录,是否打印任何差异?,python,comparison,directory-structure,python-3.3,Python,Comparison,Directory Structure,Python 3.3,如何递归比较两个目录(比较应仅基于文件名)并仅打印一个或另一个目录中的文件/文件夹 我正在使用Python 3.3 我已经看到了filecmp模块,但是,它似乎并不完全满足我的需要。最重要的是,它不仅仅基于文件名来比较文件 以下是到目前为止我得到的信息: import filecmp dcmp = filecmp.dircmp('./dir1', './dir2') dcmp.report_full_closure() dir1如下所示: dir1 - atextfile.txt -
filecmp
模块,但是,它似乎并不完全满足我的需要。最重要的是,它不仅仅基于文件名来比较文件
以下是到目前为止我得到的信息:
import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()
dir1
如下所示:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
而dir2
看起来是这样的:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
我希望结果看起来像:
files/folders only in dir1
* anotherfile.xml
* athirdfolder
files/folders only in dir2
* anotherfolder/file2.txt
我需要一种简单的pythonic方法,仅根据文件/文件夹名比较两个目录,并打印出差异
另外,我需要一种方法来检查目录是否相同
注意:我在stackoverflow和google上搜索过类似的东西。我看到了很多关于如何在考虑文件内容的情况下比较文件的示例,但我找不到关于文件名的任何内容。基本思想,使用os.walk方法填充文件名字典,然后比较字典
import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
for name in files:
fpa[name] = 1
fpb = {}
for root, dirs, files in os.walk('/your/path2'):
for name in files:
fpb[name] = 1
print "files only in a"
for name in fpa.keys():
if not(name in fpb.keys()):
print name,"\n"
print "files only in b"
for name in fpb.keys():
if not(name in fpa.keys()):
print name,"\n"
我没有测试这个,所以你可能需要修复
此外,它还可以很容易地进行重构以避免重用我的解决方案使用set()类型来存储相对路径。那么比较只是一个集减法的问题
import os
import re
def build_files_set(rootdir):
root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')
files_set = set()
for (dirpath, dirnames, filenames) in os.walk(rootdir):
for filename in filenames + dirnames:
full_path = os.path.join(dirpath, filename)
relative_path = root_to_subtract.sub('', full_path, count=1)
files_set.add(relative_path)
return files_set
def compare_directories(dir1, dir2):
files_set1 = build_files_set(dir1)
files_set2 = build_files_set(dir2)
return (files_set1 - files_set2, files_set2 - files_set1)
if __name__ == '__main__':
dir1 = 'old'
dir2 = 'new'
in_dir1, in_dir2 = compare_directories(dir1, dir2)
print '\nFiles only in {}:'.format(dir1)
for relative_path in in_dir1:
print '* {0}'.format(relative_path)
print '\nFiles only in {}:'.format(dir2)
for relative_path in in_dir2:
print '* {0}'.format(relative_path)
讨论
- 主要功能是函数build\u files\u set()。它遍历一个目录并创建一组相对文件/目录名
- 函数compare_directories()获取两组文件并返回差异——非常直接
filecmp
可以也应该用于此,但您必须进行一些编码
- 您给出了
两个目录,它分别调用left和rightfilecmp.dircmp()
是仅位于左目录中的文件和目录的列表filecmp.dircmp.left_only
是仅位于右目录中的文件和目录的列表filecmp.dircmp.right\u only
是两个目录中的目录列表filecmp.dircmp.common_dirs
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
测试用例:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
演示:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
结果:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
请注意,如果您想查看不常见目录的内部,则必须稍微修改上述代码。在上面的示例中,我所说的是这两个文件:
file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
Python 2:
import os
folder1 = os.listdir('/path1')
folder2 = os.listdir('/path2')
folder_diff = set(folder1) - set(folder2) if folder1 > folder2 else set(folder2) - set(folder1)
print folder_diff
似乎
集合
会更有效,因为dict中的值基本上是未使用的。然后,您可以使用所有有用的差/交/并。与其使用相对路径=根路径到减去.sub(“”,完整路径,计数=1),为什么不使用相对路径=os.path.relpath(完整路径,根目录)?这是独立于操作系统的,避免了正则表达式的魔力:)