在python中为目录创建唯一哈希_Python_Hash

在python中为目录创建唯一哈希

python hash

在python中为目录创建唯一哈希,python,hash,Python,Hash,我想在python中为给定目录创建一个唯一的哈希。感谢zmo在下面的代码中为目录中的每个文件生成一个散列，但是我如何聚合这些文件来生成一个散列来表示文件夹呢 import os import hashlib def sha1OfFile(filepath): sha = hashlib.sha1() with open(filepath, 'rb') as f: while True: block = f.read(2**10) # Ma

我想在python中为给定目录创建一个唯一的哈希。感谢zmo在下面的代码中为目录中的每个文件生成一个散列，但是我如何聚合这些文件来生成一个散列来表示文件夹呢

import os
import hashlib

def sha1OfFile(filepath):
    sha = hashlib.sha1()
    with open(filepath, 'rb') as f:
        while True:
            block = f.read(2**10) # Magic number: one-megabyte blocks.
            if not block: break
            sha.update(block)
        return sha.hexdigest()

for (path, dirs, files) in os.walk('.'):
  for file in files:
    print('{}: {}'.format(os.path.join(path, file),       
sha1OfFile(os.path.join(path, file)))

只需继续将数据馈送到sha对象中

import os
import hashlib

def update_sha(filepath, sha):
    with open(filepath, 'rb') as f:
        while True:
            block = f.read(2**10) # Magic number: one-megabyte blocks.
            if not block:
                break
            sha.update(block)

for (path, dirs, files) in os.walk('.'):
    sha = hashlib.sha1()
    for file in files:
        fullpath = os.path.join(path, file)
        update_sha(fullpath, sha)

    print(sha.hexdigest())

或者对文件的串联散列进行散列。

正确的做法（可能）是重复计算每个目录的散列，如下所示：

import os
import hashlib

def sha1OfFile(filepath):
    sha = hashlib.sha1()
    with open(filepath, 'rb') as f:
        while True:
            block = f.read(2**10) # Magic number: one-megabyte blocks.
            if not block: break
            sha.update(block)
        return sha.hexdigest()

def hash_dir(dir_path):
    hashes = []
    for path, dirs, files in os.walk(dir_path):
        for file in sorted(files): # we sort to guarantee that files will always go in the same order
            hashes.append(sha1OfFile(os.path.join(path, file)))
        for dir in sorted(dirs): # we sort to guarantee that dirs will always go in the same order
            hashes.append(hash_dir(os.path.join(path, dir)))
        break # we only need one iteration - to get files and dirs in current directory
    return str(hash(''.join(hashes)))

仅按

os.walk

给出的顺序使用文件（就像Markus那样）的问题在于，对于包含相同文件的不同文件结构，您可能会得到相同的哈希值。例如，这个目录的散列

main_dir_1:
    dir_1:
        file_1
        file_2
    dir_2:
        file_3

这个是

main_dir_2:
    dir_1:
        file_1
    dir_2:
        file_2
        file_3

都是一样的

另一个问题是，您需要保证文件的顺序始终相同-如果您以不同的顺序计算两个散列，并计算得到的字符串的散列，那么对于相同的目录结构，您将得到不同的结果。

str.join

散列值并散列结果字符串？或者合并文件内容并对合并内容进行哈希处理。如果执行后者（对所有文件的合并内容进行哈希处理），则应避免读取数据两次。您可以使用读入的块更新两个不同的散列对象（以防您也需要文件的散列）。这是否回答了您的问题？很高兴我能帮忙！顺便说一下，如果我的答案解决了你的问题，考虑接受它：