内存化到磁盘-python-持久内存化
有没有办法将函数的输出存储到磁盘 我有一个函数内存化到磁盘-python-持久内存化,python,memoization,Python,Memoization,有没有办法将函数的输出存储到磁盘 我有一个函数 def getHtmlOfUrl(url): ... # expensive computation 我想做一些类似的事情: def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat") 然后调用getHtmlMemoized(url),以便对每个url只进行一次昂贵的计算。类似的操作应该可以: import json class Memoize(object):
def getHtmlOfUrl(url):
... # expensive computation
我想做一些类似的事情:
def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")
然后调用getHtmlMemoized(url),以便对每个url只进行一次昂贵的计算。类似的操作应该可以:
import json
class Memoize(object):
def __init__(self, func):
self.func = func
self.memo = {}
def load_memo(filename):
with open(filename) as f:
self.memo.update(json.load(f))
def save_memo(filename):
with open(filename, 'w') as f:
json.dump(self.memo, f)
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.func(*args)
return self.memo[args]
基本用法:
your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
# do your stuff with your_mem_func
如果您想在使用文件后将“缓存”写入文件--以便将来再次加载:
your_mem_func.save_memo('yournewdata.json')
Python提供了一种非常优雅的方法来实现这一点——装饰器。基本上,decorator是一个函数,它包装另一个函数以提供附加功能,而无需更改函数源代码。你的装饰师可以这样写:
import json
def persist_to_file(file_name):
def decorator(original_func):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
def new_func(param):
if param not in cache:
cache[param] = original_func(param)
json.dump(cache, open(file_name, 'w'))
return cache[param]
return new_func
return decorator
一旦你有了它,用@语法“装饰”函数,你就准备好了
@persist_to_file('cache.dat')
def html_of_url(url):
your function code...
请注意,此修饰符是有意简化的,可能不适用于所有情况,例如,当源函数接受或返回无法json序列化的数据时
更多关于装饰师的信息:
下面是如何让装饰器在退出时只保存缓存一次:
import json, atexit
def persist_to_file(file_name):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
atexit.register(lambda: json.dump(cache, open(file_name, 'w')))
def decorator(func):
def new_func(param):
if param not in cache:
cache[param] = func(param)
return cache[param]
return new_func
return decorator
退房。这是一个库,可以完全做到这一点。有一个模块。(您需要pip安装artemis ml
)
您可以装饰您的功能:
from artemis.fileman.disk_memoize import memoize_to_disk
@memoize_to_disk
def fcn(a, b, c = None):
results = ...
return results
在内部,它对输入参数进行散列,并通过此散列保存备忘录文件。假设您的数据是可json序列化的,则此代码应该可以工作
import os, json
def json_file(fname):
def decorator(function):
def wrapper(*args, **kwargs):
if os.path.isfile(fname):
with open(fname, 'r') as f:
ret = json.load(f)
else:
with open(fname, 'w') as f:
ret = function(*args, **kwargs)
json.dump(ret, f)
return ret
return wrapper
return decorator
装饰gethtmlofur
,然后简单地调用它,如果它以前运行过,您将获得缓存的数据
使用Python2.x和Python3.x检查,这是一个由python的搁置模块提供支持的更干净的解决方案。其优点是缓存可以通过众所周知的
dict
语法实时更新,而且还具有防异常功能(无需处理恼人的KeyError
)
这将有助于只计算一次函数。接下来的调用将返回存储的结果 您可以使用缓存到磁盘包:
from cache_to_disk import cache_to_disk
@cache_to_disk(3)
def my_func(a, b, c, d=None):
results = ...
return results
这将缓存3天的结果,具体到参数a、b、c和d。结果存储在计算机上的pickle文件中,在下次调用函数时取消pickle并返回。3天后,pickle文件将被删除,直到函数重新运行。每当使用新参数调用函数时,该函数都将重新运行。更多信息请点击这里:还有
从diskcache导入缓存
cache=cache(“cachedir”)
@cache.memoize()
定义f(x,y):
打印('Running f({},{}')。格式(x,y))
返回x,y
大多数答案都是装饰风格的。但也许我不想每次调用函数时都缓存结果
我使用上下文管理器提出了一个解决方案,因此该函数可以按如下方式调用
with DiskCacher('cache_id', myfunc) as myfunc2:
res=myfunc2(...)
当您需要缓存功能时
“cache\u id”字符串用于区分数据文件,这些文件名为[calling\u script].[cache\u id].dat
。因此,如果在循环中执行此操作,则需要将循环变量合并到此缓存id
,否则数据将被覆盖
或者:
myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
或者(这可能不是很有用,因为始终使用相同的id):
完整的代码和示例(我使用pickle
保存/加载,但可以更改为任何保存/读取方法。请注意,这也是假设所讨论的函数只返回1个返回值):
from\uuuuu future\uuuuu导入打印功能
导入系统,操作系统
导入功能工具
def formFilename(文件夹,变量):
''缓存文件的路径
Args:
文件夹(str):缓存文件夹路径。
varid(str):用于形成文件名并用作变量id的变量id。
返回:
abpath(str):缓存文件的abpath,它使用
作为文件夹。文件名的格式为:
[script\u file]\u[varid].dat
'''
script_file=os.path.splitext(sys.argv[0])[0]
名称='[%s].[%s].nc%.(脚本文件,变量)
abpath=os.path.join(文件夹,名称)
返回路径
def readCache(文件夹,变量,verbose=True):
''读取缓存数据
Args:
文件夹(str):缓存文件夹路径。
变量(str):变量id。
关键字Args:
详细(bool):是否打印一些文本信息。
返回:
结果(元组):包含从缓存文件读入的数据的元组。
'''
进口泡菜
abpath_in=formFilename(文件夹,变量)
如果os.path.存在(abpath_in):
如果冗长:
打印('\n#:读入变量',变量,
'来自磁盘缓存:\n',abpath\u in)
以开放式(abpath_in,'rb')作为鳍:
结果=酸洗负荷(fin)
返回结果
def writeCache(结果、文件夹、变量、verbose=True):
''将数据写入磁盘缓存
Args:
结果(元组):包含读取到缓存的数据的元组。
文件夹(str):缓存文件夹路径。
变量(str):变量id。
关键字Args:
详细(bool):是否打印一些文本信息。
'''
进口泡菜
abpath_out=formFilename(文件夹,变量)
如果冗长:
打印('\n#:将输出保存到:\n',abpath_out)
以open(abpath_out,'wb')为基础:
pickle.dump(结果,fout)
返回
类DiskCacher(对象):
def _uinit _;(self,varid,func=None,folder=None,overwrite=False,
verbose=True):
''磁盘缓存上下文管理器
Args:
varid(str):用于保存缓存的字符串id。
函数假定只返回1个返回值。
关键字Args:
func(可调用):返回值为
缓存。
文件夹(str或None):缓存文件夹路径。如果没有,则使用默认值。
覆盖(bool):是否强制执行新计算。
详细(bool):是否打印一些文本信息。
'''
如果文件夹为“无”:
self.folder='/tmp/cache/'
其他:
self.folder=文件夹
self.func=f
myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
@DiskCacher('cache_id')
def myfunc(*args):
...
from __future__ import print_function
import sys, os
import functools
def formFilename(folder, varid):
'''Compose abspath for cache file
Args:
folder (str): cache folder path.
varid (str): variable id to form file name and used as variable id.
Returns:
abpath (str): abspath for cache file, which is using the <folder>
as folder. The file name is the format:
[script_file]_[varid].dat
'''
script_file=os.path.splitext(sys.argv[0])[0]
name='[%s]_[%s].nc' %(script_file, varid)
abpath=os.path.join(folder, name)
return abpath
def readCache(folder, varid, verbose=True):
'''Read cached data
Args:
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
Returns:
results (tuple): a tuple containing data read in from cached file(s).
'''
import pickle
abpath_in=formFilename(folder, varid)
if os.path.exists(abpath_in):
if verbose:
print('\n# <readCache>: Read in variable', varid,
'from disk cache:\n', abpath_in)
with open(abpath_in, 'rb') as fin:
results=pickle.load(fin)
return results
def writeCache(results, folder, varid, verbose=True):
'''Write data to disk cache
Args:
results (tuple): a tuple containing data read to cache.
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
'''
import pickle
abpath_out=formFilename(folder, varid)
if verbose:
print('\n# <writeCache>: Saving output to:\n',abpath_out)
with open(abpath_out, 'wb') as fout:
pickle.dump(results, fout)
return
class DiskCacher(object):
def __init__(self, varid, func=None, folder=None, overwrite=False,
verbose=True):
'''Disk cache context manager
Args:
varid (str): string id used to save cache.
function <func> is assumed to return only 1 return value.
Keyword Args:
func (callable): function object whose return values are to be
cached.
folder (str or None): cache folder path. If None, use a default.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
'''
if folder is None:
self.folder='/tmp/cache/'
else:
self.folder=folder
self.func=func
self.varid=varid
self.overwrite=overwrite
self.verbose=verbose
def __enter__(self):
if self.func is None:
raise Exception("Need to provide a callable function to __init__() when used as context manager.")
return _Cache2Disk(self.func, self.varid, self.folder,
self.overwrite, self.verbose)
def __exit__(self, type, value, traceback):
return
def __call__(self, func=None):
_func=func or self.func
return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
self.verbose)
def _Cache2Disk(func, varid, folder, overwrite, verbose):
'''Inner decorator function
Args:
func (callable): function object whose return values are to be
cached.
varid (str): variable id.
folder (str): cache folder path.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
Returns:
decorated function: if cache exists, the function is <readCache>
which will read cached data from disk. If needs to recompute,
the function is wrapped that the return values are saved to disk
before returning.
'''
def decorator_func(func):
abpath_in=formFilename(folder, varid)
@functools.wraps(func)
def wrapper(*args, **kwargs):
if os.path.exists(abpath_in) and not overwrite:
results=readCache(folder, varid, verbose)
else:
results=func(*args, **kwargs)
if not os.path.exists(folder):
os.makedirs(folder)
writeCache(results, folder, varid, verbose)
return results
return wrapper
return decorator_func(func)
if __name__=='__main__':
data=range(10) # dummy data
#--------------Use as context manager--------------
def func1(data, n):
'''dummy function'''
results=[i*n for i in data]
return results
print('\n### Context manager, 1st time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 2nd time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 3rd time call with overwrite=True')
with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
res=func1b(data, 10)
print('res =', res)
#--------------Return a new function--------------
def func2(data, n):
results=[i*n for i in data]
return results
print('\n### Wrap a new function, 1st time call')
func2b=DiskCacher('new_func')(func2)
res=func2b(data, 10)
print('res =', res)
print('\n### Wrap a new function, 2nd time call')
res=func2b(data, 10)
print('res =', res)
#----Decorate a function using the syntax sugar----
@DiskCacher('pie_dec')
def func3(data, n):
results=[i*n for i in data]
return results
print('\n### pie decorator, 1st time call')
res=func3(data, 10)
print('res =', res)
print('\n### pie decorator, 2nd time call.')
res=func3(data, 10)
print('res =', res)
### Context manager, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 2nd time call
# <readCache>: Read in variable context_mananger from disk cache:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 3rd time call with overwrite=True
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 2nd time call
# <readCache>: Read in variable new_func from disk cache:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 2nd time call.
# <readCache>: Read in variable pie_dec from disk cache:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]