Python脚本抛出内存错误_Python_List_Memory_Directory

Python脚本抛出内存错误

python list memory directory

Python脚本抛出内存错误,python,list,memory,directory,Python,List,Memory,Directory,我做了一个脚本，将映射我一个目录，并给我有关它的统计信息。。。以下是脚本： import os import hashlib import platform import sys import argparse import HTML class Map(object): def __init__(self,param): self.param_list = param self.slash = self.slash_by_os()

我做了一个脚本，将映射我一个目录，并给我有关它的统计信息。。。以下是脚本：

import os 
import hashlib
import platform
import sys
import argparse
import HTML

class Map(object):

    def __init__(self,param):
        self.param_list = param
        self.slash = self.slash_by_os()
        self.result_list = []
        self.os = ""


    def calc_md5(self,file_path):
        with open(file_path) as file_to_check:
            data = file_to_check.read()    
            md5_returned = hashlib.md5(data).hexdigest()

        return md5_returned

    def slash_by_os(self):
        general_id = platform.system()
        actual_os = ""

        if general_id == "Darwin" or general_id == "darwin":
            actual_os = "UNIX"
        elif general_id == "Linux" or general_id == "linux":
            actual_os = "UNIX"
        elif general_id  == "SunOS":
            actual_os = "UNIX"
        elif general_id == "Windows" or general_id == "windows":
            actual_os = "WIN"
        else:
            actual_os = general_id

        if actual_os == "UNIX":
            return '/'
        elif actual_os == "WIN":
            return '\\'
        else:
            return '/'

        self.os = actual_os

    def what_to_do(self,new_dir):
        act = []
        act.append(new_dir[:-1])
        for param in self.param_list:
            if param == "md5":
                x = self.calc_md5(new_dir[:-1])
                act.append(x)
            elif param == "size":
                x = os.stat(new_dir[:-1]).st_size
                act.append(x)
            elif param == "access":
                x = os.stat(new_dir[:-1]).st_atime
                act.append(x)
            elif param == "modify":
                x = os.stat(new_dir[:-1]).st_mtime
                act.append(x)
            elif param == "creation":
                    x = os.stat(new_dir[:-1]).st_ctime
                    act.append(x)   

        return act

    def list_of_files(self ,dir_name ,traversed = [], results = []): 

        dirs = os.listdir(dir_name)
        if dirs:
            for f in dirs:
                new_dir = dir_name + f + self.slash
                if os.path.isdir(new_dir) and new_dir not in traversed:
                    traversed.append(new_dir)
                    self.list_of_files(new_dir, traversed, results)
                else:
                    act = self.what_to_do(new_dir)
                    results.append(act)
        self.result_list = results  
        return results


def parse_args():
    desc = "Welcom To dirmap.py 1.0"
    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument('-p','--path', help='Path To Original Directory', required=True)
    parser.add_argument('-md','--md5', action = 'store_true',help='Show md5 hash of file', required=False)
    parser.add_argument('-s','--size', action = 'store_true', help='Show size of file', required=False)
    parser.add_argument('-a','--access', action = 'store_true',  help='Show access time of file', required=False)
    parser.add_argument('-m','--modify', action = 'store_true', help='Show modification time of file', required=False)
    parser.add_argument('-c','--creation', action = 'store_true', help='Show creation of file', required=False)

    args = vars(parser.parse_args())

    params = []
    for key,value in args.iteritems():
        if value == True:
            params.append(key)

    return args,params



def main():
    args , params = parse_args() 
    dir_path = args['path']
    map = Map(params)
    dir_list = map.list_of_files(dir_path)

    params.insert(0,"path")


    htmlcode_dir = HTML.table(dir_list,header_row=params)
    print htmlcode_dir

main()

当我尝试在中大型目录上运行它时，它会抛出一个MemoryError异常。。。正如你在这里看到的：

python(2374) malloc: *** mmap(size=140514183884800) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
   File "dirmap.py", line 132, in <module>
     main()
   File "dirmap.py", line 124, in main
     dir_list = map.list_of_files(dir_path)
   File "dirmap.py", line 86, in list_of_files
     self.list_of_files(new_dir, traversed, results)
   File "dirmap.py", line 86, in list_of_files
     self.list_of_files(new_dir, traversed, results)
   File "dirmap.py", line 86, in list_of_files
     self.list_of_files(new_dir, traversed, results)
   File "dirmap.py", line 88, in list_of_files
     act = self.what_to_do(new_dir)
   File "dirmap.py", line 60, in what_to_do
     x = self.calc_md5(new_dir[:-1])
   File "dirmap.py", line 25, in calc_md5
     data = file_to_check.read()
MemoryError

python（2374）malloc:**mmap（大小=140514183884800）失败（错误代码=12）
***错误：无法分配区域
***在malloc\u error\u break中设置断点以进行调试
回溯（最近一次呼叫最后一次）：
文件“dirmap.py”，第132行，在
main（）
文件“dirmap.py”，第124行，主
dir\u list=map.list\u文件（dir\u路径）
文件“dirmap.py”，第86行，在文件列表中
self.list\u文件（新的\u目录、遍历、结果）
文件“dirmap.py”，第86行，在文件列表中
self.list\u文件（新的\u目录、遍历、结果）
文件“dirmap.py”，第86行，在文件列表中
self.list\u文件（新的\u目录、遍历、结果）
文件“dirmap.py”，第88行，在文件列表中
act=自我。做什么（新目录）
文件“dirmap.py”，第60行，在
x=self.calc\u md5（新目录[：-1]）
calc_md5中第25行的文件“dirmap.py”
数据=文件\u到\u check.read（）
记忆者

有什么想法吗？

您可能遇到一个大文件，无法将其全部读取到calc_md5（）中的内存中。使用缓冲方法

您可能遇到一个大文件，无法将其全部读入calc_md5（）中的内存。使用缓冲方法

您正在一次性将一个大文件读入内存。不要这样做，请分块阅读，并在继续操作时更新哈希：

def calc_md5(self,file_path):
    hash = hashlib.md5()
    with open(file_path, 'rb') as file_to_check:
        for chunk in iter(lambda: file_to_check.read(4096), ''):    
            hash.update(chunk)

    return hash.hexdigest()

这将以二进制模式打开文件，避免解释不同的行尾约定（这将改变散列）

上面的代码使用的两个参数形式，其中第二个参数是sentinel值；当第一个参数（可调用参数）返回第二个参数时，迭代停止。当达到EOF时，Python文件对象返回一个空字符串。

您正在一次性将一个大文件读入内存。不要这样做，请分块阅读，并在继续操作时更新哈希：

def calc_md5(self,file_path):
    hash = hashlib.md5()
    with open(file_path, 'rb') as file_to_check:
        for chunk in iter(lambda: file_to_check.read(4096), ''):    
            hash.update(chunk)

    return hash.hexdigest()

这将以二进制模式打开文件，避免解释不同的行尾约定（这将改变散列）

可以粘贴回溯吗？我也建议不要这样做（就像你在

文件列表中所做的那样）。我真的不知道什么是回溯。。但是，如果不是列表，我应该使用什么呢？@beetea：回溯就在那里，只是格式不太好。我已经修好了。你能粘贴回溯吗？我也建议不要这样做（就像你在文件列表中所做的那样）。我真的不知道什么是回溯。。但是，如果不是列表，我应该使用什么呢？@beetea：回溯就在那里，只是格式不太好。我已经修好了。那么我应该采用哪种方法呢？块还是行？@FernandoRetimo：在仔细考虑之后，开始块和二进制阅读。例如，在文本模式下打开一个文件可以改变行尾的解释方式。我会研究它，尝试一下，然后告诉它结果。非常有效，我的朋友，非常感谢！我的朋友。。。另一个问题来了。在Mac OSX（假设所有linux）上，我在根目录上运行此操作，它抛出套接字上不支持的IEOROR Errno 102操作。。。尝试用“rb”打开文件时失败。。。有什么建议吗？那我应该采用哪种方法呢？块还是行？@FernandoRetimo：在仔细考虑之后，开始块和二进制阅读。例如，在文本模式下打开一个文件可以改变行尾的解释方式。我会研究它，尝试一下，然后告诉它结果。非常有效，我的朋友，非常感谢！我的朋友。。。另一个问题来了。在Mac OSX（假设所有linux）上，我在根目录上运行此操作，它抛出套接字上不支持的IEOROR Errno 102操作。。。尝试用“rb”打开文件时失败。。。有什么建议吗？