Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
内存化到磁盘-python-持久内存化_Python_Memoization - Fatal编程技术网

内存化到磁盘-python-持久内存化

内存化到磁盘-python-持久内存化,python,memoization,Python,Memoization,有没有办法将函数的输出存储到磁盘 我有一个函数 def getHtmlOfUrl(url): ... # expensive computation 我想做一些类似的事情: def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat") 然后调用getHtmlMemoized(url),以便对每个url只进行一次昂贵的计算。类似的操作应该可以: import json class Memoize(object):

有没有办法将函数的输出存储到磁盘

我有一个函数

def getHtmlOfUrl(url):
    ... # expensive computation
我想做一些类似的事情:

def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")

然后调用getHtmlMemoized(url),以便对每个url只进行一次昂贵的计算。

类似的操作应该可以:

import json

class Memoize(object):
    def __init__(self, func):
        self.func = func
        self.memo = {}

    def load_memo(filename):
        with open(filename) as f:
            self.memo.update(json.load(f))

    def save_memo(filename):
        with open(filename, 'w') as f:
            json.dump(self.memo, f)

    def __call__(self, *args):
        if not args in self.memo:
            self.memo[args] = self.func(*args)
        return self.memo[args]
基本用法:

your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
#  do your stuff with your_mem_func
如果您想在使用文件后将“缓存”写入文件--以便将来再次加载:

your_mem_func.save_memo('yournewdata.json')

Python提供了一种非常优雅的方法来实现这一点——装饰器。基本上,decorator是一个函数,它包装另一个函数以提供附加功能,而无需更改函数源代码。你的装饰师可以这样写:

import json

def persist_to_file(file_name):

    def decorator(original_func):

        try:
            cache = json.load(open(file_name, 'r'))
        except (IOError, ValueError):
            cache = {}

        def new_func(param):
            if param not in cache:
                cache[param] = original_func(param)
                json.dump(cache, open(file_name, 'w'))
            return cache[param]

        return new_func

    return decorator
一旦你有了它,用@语法“装饰”函数,你就准备好了

@persist_to_file('cache.dat')
def html_of_url(url):
    your function code...
请注意,此修饰符是有意简化的,可能不适用于所有情况,例如,当源函数接受或返回无法json序列化的数据时

更多关于装饰师的信息:

下面是如何让装饰器在退出时只保存缓存一次:

import json, atexit

def persist_to_file(file_name):

    try:
        cache = json.load(open(file_name, 'r'))
    except (IOError, ValueError):
        cache = {}

    atexit.register(lambda: json.dump(cache, open(file_name, 'w')))

    def decorator(func):
        def new_func(param):
            if param not in cache:
                cache[param] = func(param)
            return cache[param]
        return new_func

    return decorator
退房。这是一个库,可以完全做到这一点。

有一个模块。(您需要
pip安装artemis ml

您可以装饰您的功能:

from artemis.fileman.disk_memoize import memoize_to_disk

@memoize_to_disk
def fcn(a, b, c = None):
    results = ...
    return results

在内部,它对输入参数进行散列,并通过此散列保存备忘录文件。

假设您的数据是可json序列化的,则此代码应该可以工作

import os, json

def json_file(fname):
    def decorator(function):
        def wrapper(*args, **kwargs):
            if os.path.isfile(fname):
                with open(fname, 'r') as f:
                    ret = json.load(f)
            else:
                with open(fname, 'w') as f:
                    ret = function(*args, **kwargs)
                    json.dump(ret, f)
            return ret
        return wrapper
    return decorator
装饰
gethtmlofur
,然后简单地调用它,如果它以前运行过,您将获得缓存的数据


使用Python2.x和Python3.x检查,这是一个由python的搁置模块提供支持的更干净的解决方案。其优点是缓存可以通过众所周知的
dict
语法实时更新,而且还具有防异常功能(无需处理恼人的
KeyError


这将有助于只计算一次函数。接下来的调用将返回存储的结果

您可以使用缓存到磁盘包:

    from cache_to_disk import cache_to_disk

    @cache_to_disk(3)
    def my_func(a, b, c, d=None):
        results = ...
        return results
这将缓存3天的结果,具体到参数a、b、c和d。结果存储在计算机上的pickle文件中,在下次调用函数时取消pickle并返回。3天后,pickle文件将被删除,直到函数重新运行。每当使用新参数调用函数时,该函数都将重新运行。更多信息请点击这里:

还有

从diskcache导入缓存
cache=cache(“cachedir”)
@cache.memoize()
定义f(x,y):
打印('Running f({},{}')。格式(x,y))
返回x,y

大多数答案都是装饰风格的。但也许我不想每次调用函数时都缓存结果

我使用上下文管理器提出了一个解决方案,因此该函数可以按如下方式调用

with DiskCacher('cache_id', myfunc) as myfunc2:
    res=myfunc2(...)
当您需要缓存功能时

“cache\u id”字符串用于区分数据文件,这些文件名为
[calling\u script].[cache\u id].dat
。因此,如果在循环中执行此操作,则需要将循环变量合并到此
缓存id
,否则数据将被覆盖

或者:

myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
或者(这可能不是很有用,因为始终使用相同的id):

完整的代码和示例(我使用
pickle
保存/加载,但可以更改为任何保存/读取方法。请注意,这也是假设所讨论的函数只返回1个返回值):

from\uuuuu future\uuuuu导入打印功能
导入系统,操作系统
导入功能工具
def formFilename(文件夹,变量):
''缓存文件的路径
Args:
文件夹(str):缓存文件夹路径。
varid(str):用于形成文件名并用作变量id的变量id。
返回:
abpath(str):缓存文件的abpath,它使用
作为文件夹。文件名的格式为:
[script\u file]\u[varid].dat
'''
script_file=os.path.splitext(sys.argv[0])[0]
名称='[%s].[%s].nc%.(脚本文件,变量)
abpath=os.path.join(文件夹,名称)
返回路径
def readCache(文件夹,变量,verbose=True):
''读取缓存数据
Args:
文件夹(str):缓存文件夹路径。
变量(str):变量id。
关键字Args:
详细(bool):是否打印一些文本信息。
返回:
结果(元组):包含从缓存文件读入的数据的元组。
'''
进口泡菜
abpath_in=formFilename(文件夹,变量)
如果os.path.存在(abpath_in):
如果冗长:
打印('\n#:读入变量',变量,
'来自磁盘缓存:\n',abpath\u in)
以开放式(abpath_in,'rb')作为鳍:
结果=酸洗负荷(fin)
返回结果
def writeCache(结果、文件夹、变量、verbose=True):
''将数据写入磁盘缓存
Args:
结果(元组):包含读取到缓存的数据的元组。
文件夹(str):缓存文件夹路径。
变量(str):变量id。
关键字Args:
详细(bool):是否打印一些文本信息。
'''
进口泡菜
abpath_out=formFilename(文件夹,变量)
如果冗长:
打印('\n#:将输出保存到:\n',abpath_out)
以open(abpath_out,'wb')为基础:
pickle.dump(结果,fout)
返回
类DiskCacher(对象):
def _uinit _;(self,varid,func=None,folder=None,overwrite=False,
verbose=True):
''磁盘缓存上下文管理器
Args:
varid(str):用于保存缓存的字符串id。
函数假定只返回1个返回值。
关键字Args:
func(可调用):返回值为
缓存。
文件夹(str或None):缓存文件夹路径。如果没有,则使用默认值。
覆盖(bool):是否强制执行新计算。
详细(bool):是否打印一些文本信息。
'''
如果文件夹为“无”:
self.folder='/tmp/cache/'
其他:
self.folder=文件夹
self.func=f
myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
@DiskCacher('cache_id')
def myfunc(*args):
    ...
from __future__ import print_function
import sys, os
import functools

def formFilename(folder, varid):
    '''Compose abspath for cache file

    Args:
        folder (str): cache folder path.
        varid (str): variable id to form file name and used as variable id.
    Returns:
        abpath (str): abspath for cache file, which is using the <folder>
            as folder. The file name is the format:
                [script_file]_[varid].dat
    '''
    script_file=os.path.splitext(sys.argv[0])[0]
    name='[%s]_[%s].nc' %(script_file, varid)
    abpath=os.path.join(folder, name)

    return abpath


def readCache(folder, varid, verbose=True):
    '''Read cached data

    Args:
        folder (str): cache folder path.
        varid (str): variable id.
    Keyword Args:
        verbose (bool): whether to print some text info.
    Returns:
        results (tuple): a tuple containing data read in from cached file(s).
    '''
    import pickle
    abpath_in=formFilename(folder, varid)
    if os.path.exists(abpath_in):
        if verbose:
            print('\n# <readCache>: Read in variable', varid,
                    'from disk cache:\n', abpath_in)
        with open(abpath_in, 'rb') as fin:
            results=pickle.load(fin)

    return results


def writeCache(results, folder, varid, verbose=True):
    '''Write data to disk cache

    Args:
        results (tuple): a tuple containing data read to cache.
        folder (str): cache folder path.
        varid (str): variable id.
    Keyword Args:
        verbose (bool): whether to print some text info.
    '''
    import pickle
    abpath_out=formFilename(folder, varid)
    if verbose:
        print('\n# <writeCache>: Saving output to:\n',abpath_out)
    with open(abpath_out, 'wb') as fout:
        pickle.dump(results, fout)

    return


class DiskCacher(object):
    def __init__(self, varid, func=None, folder=None, overwrite=False,
            verbose=True):
        '''Disk cache context manager

        Args:
            varid (str): string id used to save cache.
                function <func> is assumed to return only 1 return value.
        Keyword Args:
            func (callable): function object whose return values are to be
                cached.
            folder (str or None): cache folder path. If None, use a default.
            overwrite (bool): whether to force a new computation or not.
            verbose (bool): whether to print some text info.
        '''

        if folder is None:
            self.folder='/tmp/cache/'
        else:
            self.folder=folder

        self.func=func
        self.varid=varid
        self.overwrite=overwrite
        self.verbose=verbose

    def __enter__(self):
        if self.func is None:
            raise Exception("Need to provide a callable function to __init__() when used as context manager.")

        return _Cache2Disk(self.func, self.varid, self.folder,
                self.overwrite, self.verbose)

    def __exit__(self, type, value, traceback):
        return

    def __call__(self, func=None):
        _func=func or self.func
        return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
                self.verbose)



def _Cache2Disk(func, varid, folder, overwrite, verbose):
    '''Inner decorator function

    Args:
        func (callable): function object whose return values are to be
            cached.
        varid (str): variable id.
        folder (str): cache folder path.
        overwrite (bool): whether to force a new computation or not.
        verbose (bool): whether to print some text info.
    Returns:
        decorated function: if cache exists, the function is <readCache>
            which will read cached data from disk. If needs to recompute,
            the function is wrapped that the return values are saved to disk
            before returning.
    '''

    def decorator_func(func):
        abpath_in=formFilename(folder, varid)

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            if os.path.exists(abpath_in) and not overwrite:
                results=readCache(folder, varid, verbose)
            else:
                results=func(*args, **kwargs)
                if not os.path.exists(folder):
                    os.makedirs(folder)
                writeCache(results, folder, varid, verbose)
            return results
        return wrapper

    return decorator_func(func)



if __name__=='__main__':

    data=range(10)  # dummy data

    #--------------Use as context manager--------------
    def func1(data, n):
        '''dummy function'''
        results=[i*n for i in data]
        return results

    print('\n### Context manager, 1st time call')
    with DiskCacher('context_mananger', func1) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    print('\n### Context manager, 2nd time call')
    with DiskCacher('context_mananger', func1) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    print('\n### Context manager, 3rd time call with overwrite=True')
    with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
        res=func1b(data, 10)
        print('res =', res)

    #--------------Return a new function--------------
    def func2(data, n):
        results=[i*n for i in data]
        return results

    print('\n### Wrap a new function, 1st time call')
    func2b=DiskCacher('new_func')(func2)
    res=func2b(data, 10)
    print('res =', res)

    print('\n### Wrap a new function, 2nd time call')
    res=func2b(data, 10)
    print('res =', res)

    #----Decorate a function using the syntax sugar----
    @DiskCacher('pie_dec')
    def func3(data, n):
        results=[i*n for i in data]
        return results

    print('\n### pie decorator, 1st time call')
    res=func3(data, 10)
    print('res =', res)

    print('\n### pie decorator, 2nd time call.')
    res=func3(data, 10)
    print('res =', res)

### Context manager, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Context manager, 2nd time call

# <readCache>: Read in variable context_mananger from disk cache:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Context manager, 3rd time call with overwrite=True

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Wrap a new function, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### Wrap a new function, 2nd time call

# <readCache>: Read in variable new_func from disk cache:
 /tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### pie decorator, 1st time call

# <writeCache>: Saving output to:
 /tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

### pie decorator, 2nd time call.

# <readCache>: Read in variable pie_dec from disk cache:
 /tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]