在python中递归查找目录中文件的md5
我想找到以“10”开头的文件的MD5总和(可以是exe、doc、pdf等),因此不检查文件扩展名,只检查开头的两位数。到目前为止,我有一个脚本可以遍历目录并打印出所有此类文件,但无法为每个文件打印校验和:在python中递归查找目录中文件的md5,python,hash,md5,hashlib,Python,Hash,Md5,Hashlib,我想找到以“10”开头的文件的MD5总和(可以是exe、doc、pdf等),因此不检查文件扩展名,只检查开头的两位数。到目前为止,我有一个脚本可以遍历目录并打印出所有此类文件,但无法为每个文件打印校验和: def print_files(file_directory, file_extensions=['10']): ''' Print files in file_directory with extensions in file_extens
def print_files(file_directory, file_extensions=['10']):
''' Print files in file_directory with extensions in file_extensions, recursively. '''
# Get the absolute path of the file_directory parameter
file_directory = os.path.abspath(file_directory)
# Get a list of files in file_directory
file_directory_files = os.listdir(file_directory)
# Traverse through all files
for filename in file_directory_files:
filepath = os.path.join(file_directory, filename)
# Check if it's a normal file or directory
if os.path.isfile(filepath):
# Check if the file has an extension of typical video files
for file_extension in file_extensions:
# Not a reqd file, ignore
#if not filepath.endswith(file_extension):
if not filename.startswith(file_extension) or len(filename) != 19:
continue
# We have got a '10' file!
print_files.counter += 1
## TRYING TO READ AND PRINT MD5 USING HASHLIB/ DOESNT WORK###
hasher = hashlib.md5()
with open(filename, 'rb') as afile:
buf = afile.read(65536)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(65536)
# Print it's name
print('{0}'.format(filepath))
print hasher('{0}.format(filepath)').hexdigest()
print '\n'
elif os.path.isdir(filepath):
# We got a directory, enter into it for further processing
print_files(filepath)
if __name__ == '__main__':
# Directory argument supplied
if len(sys.argv) == 2:
if os.path.isdir(sys.argv[1]):
file_directory = sys.argv[1]
else:
print('ERROR: "{0}" is not a directory.'.format(sys.argv[1]))
exit(1)
else:
# Set file directory to CWD
file_directory = os.getcwd()
print('\n -- Looking for Required Files in "{0}" -- \n'.format(file_directory))
# Set the number of processed files equal to zero
print_files.counter = 0
# Start Processing
print_files(file_directory)
# We are done. Exit now.
“打印哈希器的行应该是:
print('{0}'.format(hasher.hexdigest()))
我建议您不要递归地解决这个问题,而是使用
os.walk()
遍历目录结构。以下代码可能是print_files
函数的主体
file_directory = os.path.abspath(file_directory)
paths_to_hash = []
for root, dirs, filenames in os.walk(file_directory, topdown=False):
for i, dir in enumerate(dirs):
for filename in filenames[i]:
if filenames[:2] == '10':
paths_to_hash += [os.path.abspath('{0}/{1}/{2}'.format(root, dir, filename)]
for path in paths_to_hash:
hash = hashlib.md5(open(path, 'rb').read()).digest())
print 'hash: {0} for path: {1}'.format(hash, path)
用这条线把它修好了
print hashlib.md5(open('{0}'.format(filepath)).read()).hexdigest()
我没有读取文件,只是传递hashlib.md5。感谢Matt的洞察力。什么不起作用?它会引发异常吗?给出错误的结果?什么?什么不起作用-你有错误吗?你能告诉我们发生了什么吗?它说:--在“/home/Downloads/10/”中查找所需文件--回溯(最近一次调用):文件“list Files.py”,第82行,在print_Files(文件目录)文件“list Files.py”中,第59行,在print_Files print_Files(文件路径)文件“list Files.py”中,第46行,在print_Files中,打开(filename,'rb')作为afile:IOError:[Errno 2]没有这样的文件或目录:“101632829791839266”如果同时删除下面的部分,####尝试使用HASHLIB/DOESNT-WORK读取和打印MD5##########,代码会将每个文件打印出来:“with-open(filename,'rb”)作为afile用文件路径替换文件名。如果您觉得我的答案很有用,请在答案下面留下此注释并投票支持。谢谢。因此,我计划将此方法扩展为,传递一个包含md5的文本文件,并仅返回文本文件中存在md5的文件名。任何指针。