Python 可下标惰性映射

Python 可下标惰性映射,python,python-3.x,mapping,lazy-evaluation,Python,Python 3.x,Mapping,Lazy Evaluation,在python中,我知道有两个懒惰的“容器”:生成器和 两者都不可下标。因此map(f,data)[1]和(f(x)表示数据中的x)[1]将失败 python中是否有支持下标的惰性映射类? 如果没有,最接近的匹配是什么? 我一直在搜索functools,但没有结果(或者我没有找到它) 基本上,我正在寻找类似的东西(但重新发明车轮应该是最后的选择): 由于各种原因,这是一件很难实现的事情。它可以很容易地实现,只需对输入数据(映射和访问模式的时间复杂性)不做任何假设,但是首先拥有生成器的一个或多个优

在python中,我知道有两个懒惰的“容器”:生成器和

两者都不可下标。因此
map(f,data)[1]
(f(x)表示数据中的x)[1]
将失败

python中是否有支持下标的惰性映射类?

如果没有,最接近的匹配是什么?

我一直在搜索
functools
,但没有结果(或者我没有找到它)

基本上,我正在寻找类似的东西(但重新发明车轮应该是最后的选择):


由于各种原因,这是一件很难实现的事情。它可以很容易地实现,只需对输入数据(映射和访问模式的时间复杂性)不做任何假设,但是首先拥有生成器的一个或多个优势就消失了。在问题中给出的示例代码中,不必跟踪所有值的优势消失了

如果我们允许纯粹的随机访问模式,那么至少所有映射值都必须被缓存,再次失去生成器的内存优势

根据以下两个假设:

  • 映射函数很昂贵,而且只被稀疏地调用
  • 访问模式是随机的,因此必须缓存所有值
问题中的示例代码应该很好。这里有一些稍微不同的代码,它们也可以将生成器作为传入数据处理。它的优点是不需要在对象构造上完全建立传入数据的副本

因此,如果传入的数据有一个
\uuu getitem\uuu
方法(索引),那么使用
self.elements
dict实现一个精简的缓存包装器。如果访问稀疏,字典比列表更有效。如果传入数据没有索引,我们只需要使用和存储挂起的传入数据,以便以后进行映射

守则:

class LazilyMapped:
    def __init__(self, fn, iterable):
        """LazilyMapped lazily evaluates a mapping/transformation function on incoming values

        Assumes mapping is expensive, and [idx] access is random and possibly sparse.
        Still, this may defeat having a memory efficient generator on the incoming data,
        because we must remember all read data for potential future access.

        Lots of different optimizations could be done if there's more information on the
        access pattern. For example, memory could be saved if we knew that the access idx
        is monotonic increasing (then a list storage would be more efficient, because
        forgetting data is then trivial), or if we'd knew that any index is only accessed
        once (we could get rid of the infite cache), or if the access is not sparse, but
        random we should be using a list instead of a dict for self.elements.

        fn is a filter function, getting one element of iterable and returning a bool,
        iterable may be a generator
        """
        self.fn = fn
        self.sparse_in = hasattr(iterable, '__getitem__')
        if self.sparse_in:
            self.original_values = iterable
        else:
            self.iter = iter(iterable)
            self.original_idx = 0      # keep track of which index to do next in incoming data
            self.original_values = []  # keep track of all incoming values
        self.elements = {}      # forever remember mapped data
    def proceed_to(self, idx):
        """Consume incoming values and store for later mapping"""
        if idx >= self.original_idx:
            for _ in range(self.original_idx, idx + 1):
                self.original_values.append(next(self.iter))
            self.original_idx = idx + 1
    def __getitem__(self, idx):
        if idx not in self.elements:
            if not self.sparse_in:
                self.proceed_to(idx)
            self.elements[idx] = mapped = self.fn(self.original_values[idx])
        else:
            mapped = self.elements[idx]
        return mapped

if __name__ == '__main__':
    test_list = [1,2,3,4,5]
    dut = LazilyMapped(lambda v: v**2, test_list)
    assert dut[0] == 1 
    assert dut[2] == 9
    assert dut[1] == 4

    dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7)))
    assert dut[0] == 1 
    assert dut[2] == 9
    assert dut[1] == 4

我不知道标准python库中有任何这样的容器,因此我自己在一个类似itertools的可索引对象中实现了它:

导入工具
def do(x):
打印(“->立即计算”)
返回x+2
a=[1,2,3,4]
m=seqtools.smap(do,a)
m=seqtools.add_缓存(m,len(m))
#由于评估延迟,未打印任何内容
m[0]
->现在计算

三,


你看了吗?@jornsharpe刚刚看了,但是
list(islice(map(f,data),6,7))
调用了所需索引之前的所有元素
f
。因此没有收益。或者您将如何在这里使用
islice
?所以您希望随机访问,但也希望延迟访问?我不知道有什么工具能做到这一点。请注意,您的示例仅适用于作为列表的
lst
,而不是任意的iterable。您的需求组合似乎有些不同寻常,因此您可能必须自己实现它。@BrenBarn谢谢您的评论。我知道这段代码只适用于列表,因此得名。我不知道这有点不寻常,因为在其他语言中也有这种情况(这些语言都是基于惰性评估构建的,所以这可能是一个没有实际意义的问题)。感谢您提供了非常详细的答案。给我时间消化。
class LazilyMapped:
    def __init__(self, fn, iterable):
        """LazilyMapped lazily evaluates a mapping/transformation function on incoming values

        Assumes mapping is expensive, and [idx] access is random and possibly sparse.
        Still, this may defeat having a memory efficient generator on the incoming data,
        because we must remember all read data for potential future access.

        Lots of different optimizations could be done if there's more information on the
        access pattern. For example, memory could be saved if we knew that the access idx
        is monotonic increasing (then a list storage would be more efficient, because
        forgetting data is then trivial), or if we'd knew that any index is only accessed
        once (we could get rid of the infite cache), or if the access is not sparse, but
        random we should be using a list instead of a dict for self.elements.

        fn is a filter function, getting one element of iterable and returning a bool,
        iterable may be a generator
        """
        self.fn = fn
        self.sparse_in = hasattr(iterable, '__getitem__')
        if self.sparse_in:
            self.original_values = iterable
        else:
            self.iter = iter(iterable)
            self.original_idx = 0      # keep track of which index to do next in incoming data
            self.original_values = []  # keep track of all incoming values
        self.elements = {}      # forever remember mapped data
    def proceed_to(self, idx):
        """Consume incoming values and store for later mapping"""
        if idx >= self.original_idx:
            for _ in range(self.original_idx, idx + 1):
                self.original_values.append(next(self.iter))
            self.original_idx = idx + 1
    def __getitem__(self, idx):
        if idx not in self.elements:
            if not self.sparse_in:
                self.proceed_to(idx)
            self.elements[idx] = mapped = self.fn(self.original_values[idx])
        else:
            mapped = self.elements[idx]
        return mapped

if __name__ == '__main__':
    test_list = [1,2,3,4,5]
    dut = LazilyMapped(lambda v: v**2, test_list)
    assert dut[0] == 1 
    assert dut[2] == 9
    assert dut[1] == 4

    dut = LazilyMapped(lambda v: v**2, (num for num in range(1, 7)))
    assert dut[0] == 1 
    assert dut[2] == 9
    assert dut[1] == 4