Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/lua/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python md5指纹识别时如何跳过不可损坏(损坏)的文件?_Python_Python 3.x - Fatal编程技术网

Python md5指纹识别时如何跳过不可损坏(损坏)的文件?

Python md5指纹识别时如何跳过不可损坏(损坏)的文件?,python,python-3.x,Python,Python 3.x,下面的代码生成md5/元数据指纹,但在未知损坏的文件上崩溃(例如,可以复制的文件,大部分甚至可以打开,但不能散列或压缩[以掩盖其损坏]) 问题:如何使此代码跳过或忽略任何和所有问题文件,而只执行其余部分?想象一下,8 TB上有100万个文件。否则我会让它运行,没有实时监控进度,2天后我发现没有任何东西被哈希,因为几个问题文件导致代码挂起 部分代码(请参阅下面的完整代码): 错误: FileName : T:\problemtest\problemfile.doc is of size 27136

下面的代码生成md5/元数据指纹,但在未知损坏的文件上崩溃(例如,可以复制的文件,大部分甚至可以打开,但不能散列或压缩[以掩盖其损坏])

问题:如何使此代码跳过或忽略任何和所有问题文件,而只执行其余部分?想象一下,8 TB上有100万个文件。否则我会让它运行,没有实时监控进度,2天后我发现没有任何东西被哈希,因为几个问题文件导致代码挂起

部分代码(请参阅下面的完整代码):

错误:

FileName : T:\problemtest\problemfile.doc is of size 27136 and was modified on2010-10-10 13:58:32
Traceback (most recent call last):
  File "t:\scripts\test.py", line 196, in <module>
    createBasicInfoListFromDisk()
  File "t:\scripts\test.py", line 76, in createBasicInfoListFromDisk
    mod_on =  get_last_write_time(file_path)
  File "t:\scripts\test.py", line 61, in get_last_write_time
    convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
OSError: [Errno 22] Invalid argument
import os
import sys
import time
import datetime
import difflib
import decimal
import hashlib
from pip._vendor.distlib.compat import raw_input

csvListDetails = list()
csvCompareListDetails = list()
diskCompareListDetails = list()
onlyFileNameOnDisk = list()
addedFiles = list()
removedFiles = list()
driveLetter =""
finalFilesToChange=list()
finalFilesToDelete=list()
changedFiles=list()
csvfilewithPath="md5.csv"
import shutil
walk_dir=""

def findAndReadCSVFile(fileName):

    global csvListDetails 
    global csvCompareListDetails
    haveIgnoredLine = 0
    foundFile=0

    try :
        inputFileHandler = open(fileName,"rt",encoding='utf-8')
        update_time = get_last_write_time(fileName)
        print("\n   Found md5.csv, last updated on: %s" % update_time)
        foundFile=1

    except (OSError, IOError, FileNotFoundError):
        print("\n   md5.csv not found. Will create a new one.")
        return foundFile

    for line in inputFileHandler:
        if (haveIgnoredLine==0):
            haveIgnoredLine=1
            continue

        rowItem = line.replace("\n","").split('","')
        csvCompareListDetails.append('"' + rowItem[3]+',"'+rowItem[2]+'","' +rowItem[1]+'"')
        lineDetails = list()

        for detailNum in range (0,len(rowItem)):
            lineDetails.append('"' + (rowItem[detailNum].replace('"','')) + '"')

        csvListDetails.append(lineDetails)

    inputFileHandler.close()

    return foundFile

def get_last_write_time(filename):
    st = os.stat(filename)
    convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
    return convert_time_to_human_readable

def createBasicInfoListFromDisk():

    global diskCompareListDetails, onlyFileNameOnDisk, driveLetter,walk_dir

    walk_dir = os.path.abspath(walk_dir)
    for root, subdirs, files in os.walk(walk_dir, topdown=True, onerror=None, followlinks=True ):
        for filename in files:
            file_path = os.path.join(root, filename)
            temp = file_path.split(":")
            driveLetter = temp[0]
            filePathWithoutDriveLetter = temp[1]
            fileSize = os.path.getsize(file_path)
            mod_on =  get_last_write_time(file_path)
            print('\t- file %s (full path: %s)' % (filename, file_path))
            print('FileName : {filename} is of size {size} and was modified on{mdt}'.format(filename=file_path,size=fileSize,mdt=mod_on ))

            diskCompareListDetails.append("\"" + filePathWithoutDriveLetter+"\",\""+str(fileSize) + "\",\"" + mod_on +'"')
            onlyFileNameOnDisk.append("\""+filePathWithoutDriveLetter+"\"")

    return

def compareLogAndDiskLists():
    global addedFiles, removedFiles

    diff = difflib.unified_diff(csvCompareListDetails, diskCompareListDetails, fromfile='file1', tofile='file2', lineterm='', n=0)
    lines = list(diff)[2:]
    addedFiles = [line[1:] for line in lines if line[0] == '+']
    removedFiles = [line[1:] for line in lines if line[0] == '-']

    return

def displayInfoForUserInput():
    global finalFilesToChange, finalFilesToDelete

    changedOrNewFileCount = 0
    noLongerExistingFilesCount = 0
    totalSizeOfChange = 0

    for line in addedFiles:
        if line not in removedFiles:

            changedOrNewFileCount = changedOrNewFileCount +1

            elements =  line.replace("\n","").split('","')
            sizeOfFile= int(elements[1].replace('"',''))
            totalSizeOfChange = totalSizeOfChange + sizeOfFile
            finalFilesToChange.append(elements[0] +'"')

    for line in removedFiles:

        elements = line.split('","')
        if elements[0]+'"' not in onlyFileNameOnDisk:
            noLongerExistingFilesCount = noLongerExistingFilesCount + 1
            finalFilesToDelete.append(elements[0]+'"')

    GBModSz= decimal.Decimal(totalSizeOfChange) / decimal.Decimal('1073741824')
    print("\n   New or modified files on drive: {} (need to hash)".format(changedOrNewFileCount))
    print ("   Obsolete lines in md5.csv (files modified or not on drive): {} (lines to delete)".format(noLongerExistingFilesCount))
    print ("   {} files ({:.2f} GB) needs to be hashed.".format(changedOrNewFileCount,GBModSz))

    userInput = raw_input("\n   Proceed with hash? (Y/N, Yes/No) ")

    if (userInput.strip().upper() == "Y" or userInput.strip().upper() == "YES"):
        print("Continuing Processing...")
    else:
        print("You opted not to continue, Exiting")
        sys.exit()

    return

def processFiles(foundFile):

    if (foundFile==1):
        oldFileName = walk_dir+"/md5.csv"
        shutil.copy( oldFileName, getTargetFileName(oldFileName))

    BLOCKSIZE = 1048576*4
    global changedFiles
    for fileToHash in finalFilesToChange:
        hasher = hashlib.new('md5')
        fileToUse=driveLetter+":"+fileToHash.replace('"','')
        with open(fileToUse, 'rb') as afile:
            buf = afile.read(BLOCKSIZE)
            while len(buf) > 0:
                hasher.update(buf)
                buf = afile.read(BLOCKSIZE)

        fileDetails = list()
        fileDetails.append(hasher.hexdigest())
        fileDetails.append(get_last_write_time(fileToUse))
        fileDetails.append(os.path.getsize(fileToUse))
        fileDetails.append(fileToHash)
        changedFiles.append(fileDetails)

    return 

def getTargetFileName(oldFileName):
    targetFileName= walk_dir+"/generated_on_" + get_last_write_time(oldFileName).replace(" ","_").replace("-","").replace(":","")
    targetFileName = targetFileName + "__archived_on_" + datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    targetFileName = targetFileName + "__md5.csv"
    return targetFileName


def writeCSVFile(fileName):
    try :
        outputFileHandler=open(fileName,"wt",encoding='utf-8')
        outputFileHandler.write("\"md5Hash\",\"LastWriteTime\",\"Length\",\"FullName\"\n")
        for details in csvListDetails:
            if details[3] in finalFilesToDelete:
                continue
            if details[3] in finalFilesToChange:
                continue
            outputFileHandler.write("{},{},{},{}\n".format(details[0],details[1],details[2],details[3]))

        for details in changedFiles:
            outputFileHandler.write("\"{}\",\"{}\",\"{}\",{}\n".format(details[0],details[1],details[2],details[3]))
        outputFileHandler.close()

    except (OSError, IOError, FileNotFoundError) as e:
        print("ERROR :")
        print("File {} is either not writable or some other error: {}".format(fileName,e))

    return

if __name__ == '__main__':

    walk_dir = raw_input("\n   Enter drive or directory to scan: ")
    csvfilewithPath=walk_dir+"/md5.csv"
    print("\n   Drive to scan: " + walk_dir)   

    foundFile = 0
    foundFile=findAndReadCSVFile(csvfilewithPath)
    createBasicInfoListFromDisk()
    compareLogAndDiskLists()
    displayInfoForUserInput()
    processFiles(foundFile)
    writeCSVFile(csvfilewithPath)
def get_last_write_time(filename):
    try:
        st = os.stat(filename)
        convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
    return convert_time_to_human_readable
    except OSError:
        pass
    return "ERROR"

def createBasicInfoListFromDisk():
尝试此修复,运气不佳:

FileName : T:\problemtest\problemfile.doc is of size 27136 and was modified on2010-10-10 13:58:32
Traceback (most recent call last):
  File "t:\scripts\test.py", line 196, in <module>
    createBasicInfoListFromDisk()
  File "t:\scripts\test.py", line 76, in createBasicInfoListFromDisk
    mod_on =  get_last_write_time(file_path)
  File "t:\scripts\test.py", line 61, in get_last_write_time
    convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
OSError: [Errno 22] Invalid argument
import os
import sys
import time
import datetime
import difflib
import decimal
import hashlib
from pip._vendor.distlib.compat import raw_input

csvListDetails = list()
csvCompareListDetails = list()
diskCompareListDetails = list()
onlyFileNameOnDisk = list()
addedFiles = list()
removedFiles = list()
driveLetter =""
finalFilesToChange=list()
finalFilesToDelete=list()
changedFiles=list()
csvfilewithPath="md5.csv"
import shutil
walk_dir=""

def findAndReadCSVFile(fileName):

    global csvListDetails 
    global csvCompareListDetails
    haveIgnoredLine = 0
    foundFile=0

    try :
        inputFileHandler = open(fileName,"rt",encoding='utf-8')
        update_time = get_last_write_time(fileName)
        print("\n   Found md5.csv, last updated on: %s" % update_time)
        foundFile=1

    except (OSError, IOError, FileNotFoundError):
        print("\n   md5.csv not found. Will create a new one.")
        return foundFile

    for line in inputFileHandler:
        if (haveIgnoredLine==0):
            haveIgnoredLine=1
            continue

        rowItem = line.replace("\n","").split('","')
        csvCompareListDetails.append('"' + rowItem[3]+',"'+rowItem[2]+'","' +rowItem[1]+'"')
        lineDetails = list()

        for detailNum in range (0,len(rowItem)):
            lineDetails.append('"' + (rowItem[detailNum].replace('"','')) + '"')

        csvListDetails.append(lineDetails)

    inputFileHandler.close()

    return foundFile

def get_last_write_time(filename):
    st = os.stat(filename)
    convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
    return convert_time_to_human_readable

def createBasicInfoListFromDisk():

    global diskCompareListDetails, onlyFileNameOnDisk, driveLetter,walk_dir

    walk_dir = os.path.abspath(walk_dir)
    for root, subdirs, files in os.walk(walk_dir, topdown=True, onerror=None, followlinks=True ):
        for filename in files:
            file_path = os.path.join(root, filename)
            temp = file_path.split(":")
            driveLetter = temp[0]
            filePathWithoutDriveLetter = temp[1]
            fileSize = os.path.getsize(file_path)
            mod_on =  get_last_write_time(file_path)
            print('\t- file %s (full path: %s)' % (filename, file_path))
            print('FileName : {filename} is of size {size} and was modified on{mdt}'.format(filename=file_path,size=fileSize,mdt=mod_on ))

            diskCompareListDetails.append("\"" + filePathWithoutDriveLetter+"\",\""+str(fileSize) + "\",\"" + mod_on +'"')
            onlyFileNameOnDisk.append("\""+filePathWithoutDriveLetter+"\"")

    return

def compareLogAndDiskLists():
    global addedFiles, removedFiles

    diff = difflib.unified_diff(csvCompareListDetails, diskCompareListDetails, fromfile='file1', tofile='file2', lineterm='', n=0)
    lines = list(diff)[2:]
    addedFiles = [line[1:] for line in lines if line[0] == '+']
    removedFiles = [line[1:] for line in lines if line[0] == '-']

    return

def displayInfoForUserInput():
    global finalFilesToChange, finalFilesToDelete

    changedOrNewFileCount = 0
    noLongerExistingFilesCount = 0
    totalSizeOfChange = 0

    for line in addedFiles:
        if line not in removedFiles:

            changedOrNewFileCount = changedOrNewFileCount +1

            elements =  line.replace("\n","").split('","')
            sizeOfFile= int(elements[1].replace('"',''))
            totalSizeOfChange = totalSizeOfChange + sizeOfFile
            finalFilesToChange.append(elements[0] +'"')

    for line in removedFiles:

        elements = line.split('","')
        if elements[0]+'"' not in onlyFileNameOnDisk:
            noLongerExistingFilesCount = noLongerExistingFilesCount + 1
            finalFilesToDelete.append(elements[0]+'"')

    GBModSz= decimal.Decimal(totalSizeOfChange) / decimal.Decimal('1073741824')
    print("\n   New or modified files on drive: {} (need to hash)".format(changedOrNewFileCount))
    print ("   Obsolete lines in md5.csv (files modified or not on drive): {} (lines to delete)".format(noLongerExistingFilesCount))
    print ("   {} files ({:.2f} GB) needs to be hashed.".format(changedOrNewFileCount,GBModSz))

    userInput = raw_input("\n   Proceed with hash? (Y/N, Yes/No) ")

    if (userInput.strip().upper() == "Y" or userInput.strip().upper() == "YES"):
        print("Continuing Processing...")
    else:
        print("You opted not to continue, Exiting")
        sys.exit()

    return

def processFiles(foundFile):

    if (foundFile==1):
        oldFileName = walk_dir+"/md5.csv"
        shutil.copy( oldFileName, getTargetFileName(oldFileName))

    BLOCKSIZE = 1048576*4
    global changedFiles
    for fileToHash in finalFilesToChange:
        hasher = hashlib.new('md5')
        fileToUse=driveLetter+":"+fileToHash.replace('"','')
        with open(fileToUse, 'rb') as afile:
            buf = afile.read(BLOCKSIZE)
            while len(buf) > 0:
                hasher.update(buf)
                buf = afile.read(BLOCKSIZE)

        fileDetails = list()
        fileDetails.append(hasher.hexdigest())
        fileDetails.append(get_last_write_time(fileToUse))
        fileDetails.append(os.path.getsize(fileToUse))
        fileDetails.append(fileToHash)
        changedFiles.append(fileDetails)

    return 

def getTargetFileName(oldFileName):
    targetFileName= walk_dir+"/generated_on_" + get_last_write_time(oldFileName).replace(" ","_").replace("-","").replace(":","")
    targetFileName = targetFileName + "__archived_on_" + datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    targetFileName = targetFileName + "__md5.csv"
    return targetFileName


def writeCSVFile(fileName):
    try :
        outputFileHandler=open(fileName,"wt",encoding='utf-8')
        outputFileHandler.write("\"md5Hash\",\"LastWriteTime\",\"Length\",\"FullName\"\n")
        for details in csvListDetails:
            if details[3] in finalFilesToDelete:
                continue
            if details[3] in finalFilesToChange:
                continue
            outputFileHandler.write("{},{},{},{}\n".format(details[0],details[1],details[2],details[3]))

        for details in changedFiles:
            outputFileHandler.write("\"{}\",\"{}\",\"{}\",{}\n".format(details[0],details[1],details[2],details[3]))
        outputFileHandler.close()

    except (OSError, IOError, FileNotFoundError) as e:
        print("ERROR :")
        print("File {} is either not writable or some other error: {}".format(fileName,e))

    return

if __name__ == '__main__':

    walk_dir = raw_input("\n   Enter drive or directory to scan: ")
    csvfilewithPath=walk_dir+"/md5.csv"
    print("\n   Drive to scan: " + walk_dir)   

    foundFile = 0
    foundFile=findAndReadCSVFile(csvfilewithPath)
    createBasicInfoListFromDisk()
    compareLogAndDiskLists()
    displayInfoForUserInput()
    processFiles(foundFile)
    writeCSVFile(csvfilewithPath)
def get_last_write_time(filename):
    try:
        st = os.stat(filename)
        convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(st.st_mtime))
    return convert_time_to_human_readable
    except OSError:
        pass
    return "ERROR"

def createBasicInfoListFromDisk():

更新答案,更新帖子

如前所述,指定了异常类型的
except
语句捕获所有内容。所以,为了做你想做的事。。。恐怕可能的答案是:

  • 创建一个识别损坏文件并正确处理的方法

  • 进行
    try,除了
    语句封装代码中可能出现错误的每个部分之外

不过,我要提醒您第二种解决方案,因为有时会出现您不想避免的系统错误。我认为您应该打印捕获的异常,以便识别可能遇到的进一步问题

正如您可能不知道的那样:您的错误不在
try中,除了
语句。您的错误在(如果我在编辑器中正确复制和粘贴)第196行,
createBasicFolistFromDisk()
,然后是第76行,
mod\u on=get\u last\u write\u time(文件路径)

正如您还提到的,您正在使用Python3.x,我建议您研究
suppress
函数()


我希望它能帮到你。

我同意IMCoins的观点,我很好奇为什么except没有捕捉到错误

因此,我要做的第一件事是找到引发OSError的源代码,并尝试明确地捕获它

def get_last_write_time(filename):
   try:
      st = os.stat(filename)
      convert_time_to_human_readable = time.strftime("%Y-%m-%d %H:%M:%S",
                                                     time.localtime(st.st_mtime)
   return convert_time_to_human_readable
   except OSError:
      pass
   return "ERROR" #or whatever string you want add

如果没有损坏的文件,操作代码就可以正常工作。无论谁写得不同,我都愿意接受建议。但是我需要修改,使其跳过并忽略任何原因的所有问题文件,并继续完成整个8TB驱动器\谢谢。您能告诉我您对OP中的代码提出了哪些确切的更改吗?我想尝试一下,但不想把代码弄糟(它可以在100万个未损坏的文件上完美运行)。我在显示的代码中找不到
createBasicFolistFromDisk()
,因此我怀疑这不是引发错误的真正代码。请给出与给定代码相对应的准确错误跟踪。根据请求进行更新。确定,因此错误发生在
createBasicInfoListFromDisk()
函数中,该函数不包含try-except块。只需添加一个就可以解决这个问题。我非常感谢您的评论,但我不确定如何正确添加try Exception块。谢谢。请你在这里张贴你在作品中对代码提出的确切修改。谢谢!唯一的更改是在函数“get_last_write_time():”中,将其替换为我的,然后重试。
返回convert_time_to_human_readable^语法错误:无效语法
请参见我在操作的最底部更改的行。给出错误。