Python 类方法返回迭代器_Python_Iterator

Python 类方法返回迭代器

python

Python 类方法返回迭代器,python,iterator,Python,Iterator,我实现了一个迭代器类，如下所示： import numpy as np import time class Data: def __init__(self, filepath): # Computationaly expensive print("Computationally expensive") time.sleep(10) print("Done!") def __iter__(self):

我实现了一个迭代器类，如下所示：

import numpy as np
import time


class Data:

    def __init__(self, filepath):
        # Computationaly expensive
        print("Computationally expensive")
        time.sleep(10)
        print("Done!")

    def __iter__(self):
        return self

    def __next__(self):
        return np.zeros((2,2)), np.zeros((2,2))


count = 0
for batch_x, batch_y in Data("hello.csv"):
    print(batch_x, batch_y)
    count = count + 1

    if count > 5:
        break


count = 0
for batch_x, batch_y in Data("hello.csv"):
    print(batch_x, batch_y)
    count = count + 1

    if count > 5:
        break

然而，构造函数的计算代价很高，for循环可能会被多次调用。例如，在上面的代码中，构造函数被调用两次（每个都用于创建一个新的数据对象）

如何分离构造函数和迭代器？我希望有以下代码，其中构造函数只调用一次：

data = Data(filepath)

for batch_x, batch_y in data.get_iterator():
    print(batch_x, batch_y)

for batch_x, batch_y in data.get_iterator():
    print(batch_x, batch_y)

您可以直接迭代iterable对象，

for..in

不需要任何其他内容：

data = Data(filepath)

for batch_x, batch_y in data:
    print(batch_x, batch_y)

for batch_x, batch_y in data:
    print(batch_x, batch_y)

这就是说，这可能是有缺陷的，具体取决于您如何实现

\uuuuuuuuuiter（）

例如：

坏的因为这样你就不能在同一个对象上迭代两次，因为

self.\u i

仍然会指向循环的末尾

好的这将在每次迭代时重置索引，修复上述问题。如果在同一个对象上嵌套迭代，那么这将不起作用

更好要解决此问题，请将迭代状态保留在单独的迭代器对象中：

class Data:
    class Iter:
        def __init__(self, data):
            self._data = data
            self._i = 0
        def __next__(self):
            if self._i >= len(self._data._items): # check for available data
                raise StopIteration
            result = self._data._items[self._i]
            self._i = self._i + 1
    def __init__(self, filepath):
        self._items = load_items(filepath)
    def __iter__(self): 
        return self.Iter(self)

这是最灵活的方法，但如果您可以使用以下任一方法，那么它就不必要地冗长

简单，使用

yield

如果您使用Python的生成器，该语言将负责跟踪迭代状态，即使在嵌套循环时，它也应该正确地跟踪迭代状态：

class Data:
    def __init__(self, filepath):
        self._items= load_items(filepath)
    def __iter__(self): 
        for it in self._items: # Or whatever is appropriate
            yield return it

简单，传递到底层iterable 如果“计算代价高昂”的部分是将所有数据加载到内存中，那么您可以直接使用缓存的数据

class Data:
    def __init__(self, filepath):
        self._items = load_items(filepath)
    def __iter__(self): 
        return iter(self._items)

与其创建

数据的新实例

，不如创建第二个类

IterData

，该类包含一个

\uuuuu init\uuuuu

方法，该方法运行的进程在计算上没有实例化

数据

那么昂贵。然后，在

Data

中创建

classmethod

，作为

IterData

的替代构造函数：

class IterData:
  def __init__(self, filepath):
     #only pass the necessary data
  def __iter__(self):
     #implement iter here

class Data:
  def __init__(self, filepath):
    # Computationaly expensive
  @classmethod
  def new_iter(cls, filepath):
    return IterData(filepath)

results = Data.new_iter('path')
for batch_x, batch_y in results:
   pass

我是说。。。上面的代码有什么问题？构造和迭代已经是分开的。只需重用通过调用

Data（）

创建的对象，而不是创建一个新对象。您是否有一个当前代码的示例，其中构造函数被调用了两次，但您不想这样做？我更新了这个问题。在第一个版本中，构造函数被调用了两次。不明白为什么

data.get\u iterator（）

。如果你只为数据中的batch_x，batch_y编写

，那有什么错？@TomášCerha:它调用构造函数\uu_init_uuu，这在计算上是昂贵的。我添加了一个带有大量变体的答案，哪一个是“最好的”真的取决于你案例的具体情况-哪一部分是“计算上昂贵的”，之后的数据是否全部在列表或内存中的其他可编辑对象中可用，或者您是否仍在按需获取数据\
class Data:
    def __init__(self, filepath):
        self._items = load_items(filepath)
    def __iter__(self): 
        return iter(self._items)

class IterData:
  def __init__(self, filepath):
     #only pass the necessary data
  def __iter__(self):
     #implement iter here

class Data:
  def __init__(self, filepath):
    # Computationaly expensive
  @classmethod
  def new_iter(cls, filepath):
    return IterData(filepath)

results = Data.new_iter('path')
for batch_x, batch_y in results:
   pass