Python中大型目录/文件的元数据列表(MD5,修改的时间、大小、路径)

Python中大型目录/文件的元数据列表(MD5,修改的时间、大小、路径),python,windows,timestamp,metadata,md5,Python,Windows,Timestamp,Metadata,Md5,我正在组装一个脚本,以在多达8TB的目录中找到超过100万个文件(包括一些~50GB的文件),并将结果导出为.csv格式,例如“md5”、“LastWriteTime”、“文件大小”、“fullpath\file.ext”: 我一直在编码,输出.csv为空: def md5(fname): hash_md5 = hashlib.md5() with open(fname, "rb") as f: for chunk in iter(lambda: f.read(2

我正在组装一个脚本,以在多达8TB的目录中找到超过100万个文件(包括一些~50GB的文件),并将结果导出为.csv格式,例如“md5”、“LastWriteTime”、“文件大小”、“fullpath\file.ext”:

我一直在编码,输出.csv为空:

def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(2 ** 20), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()
    def getSize(filename):
    st = os.stat(filename)
    return st.st_size()
    with open('md5_filelist.csv', 'w') as md5_filelist:
    file.write('hash_md5.hexdigest','timestamp','st.st_size','os.path.abspath')te')
我做错了什么(我是Python新手)?谢谢。

再试一次:

import hashlib
import os
import time

your_target_folder = "."


def get_size(filename):
    st = os.stat(filename)
    return str(st.st_size)


def get_last_write_time(filename):
    st = os.stat(filename)
    convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
    return convert_time_to_human_readable


def get_md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(2 ** 20), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()


for dirpath, _, filenames in os.walk(your_target_folder):

    for items in filenames:

        file_full_path = os.path.abspath(os.path.join(dirpath, items))

        try:

            my_last_data = get_md5(file_full_path) + ", " + get_last_write_time(file_full_path) + ", " + get_size(
                file_full_path) + ", " + file_full_path + "\n"

            with open("md5_filelist.csv", "a") as my_save_file:
                my_save_file.write(my_last_data)

            print(str(file_full_path) + "  ||| Done")

        except:
            print("Error On " + str(file_full_path))
我更改了fullpathaddress方法,并添加了time.strftime(“%Y-%m-%d%H:%m:%S”,time.localtime(st.st\u mtime))以将该时间转换为人类可读的格式


祝您好运…

您遇到了什么错误?`File“C:\SCRIPTS\md5.py”,第7行,open('md5_filelist.csv','w')作为md5_filelist:^TabError:缩进中制表符和空格的使用不一致`。您只能使用一个或另一个,否则会出现此错误。您永远不会写入
md5_filelist
要写入python中打开的文件,请使用
file.write('string to write')
谢谢,这很好!我将第5行保留为“
,因为我在目录中运行它(通常根据实际任务而更改)。两个小问题:(i)时间戳(在Windows中)如下所示:
1424316324.6933541
用于实际的LastWriteTime(上次修改日期YYYY-MM-DD HH:MM:SS)
2015-02-19 03:25:24
,以及(ii)它不打印子目录中文件的完整路径。(我还尝试将目标目录添加到第5行,但没有任何输出文件,只在命令窗口中显示错误消息)@user3026965:Done!再试一次:)然后打勾给我。
import hashlib
import os
import time

your_target_folder = "."


def get_size(filename):
    st = os.stat(filename)
    return str(st.st_size)


def get_last_write_time(filename):
    st = os.stat(filename)
    convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
    return convert_time_to_human_readable


def get_md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(2 ** 20), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()


for dirpath, _, filenames in os.walk(your_target_folder):

    for items in filenames:

        file_full_path = os.path.abspath(os.path.join(dirpath, items))

        try:

            my_last_data = get_md5(file_full_path) + ", " + get_last_write_time(file_full_path) + ", " + get_size(
                file_full_path) + ", " + file_full_path + "\n"

            with open("md5_filelist.csv", "a") as my_save_file:
                my_save_file.write(my_last_data)

            print(str(file_full_path) + "  ||| Done")

        except:
            print("Error On " + str(file_full_path))