Python 如何使用Neuraxe实现惰性数据加载的存储库?

Python 如何使用Neuraxe实现惰性数据加载的存储库?,python,machine-learning,neuraxle,Python,Machine Learning,Neuraxle,在中,显示了一个示例,使用存储库在管道中延迟加载数据,请参见以下代码: 来自neuraxe.pipeline导入管道,MiniBatchSequentialPipeline 从neuraxe.base导入ExecutionContext 从neuraxe.steps.column\u transformer导入column transformer 从neuraxe.steps.flow导入列车仅限rapper training\u data\u id=training\u data\u repo

在中,显示了一个示例,使用存储库在管道中延迟加载数据,请参见以下代码:

来自neuraxe.pipeline导入管道,MiniBatchSequentialPipeline
从neuraxe.base导入ExecutionContext
从neuraxe.steps.column\u transformer导入column transformer
从neuraxe.steps.flow导入列车仅限rapper
training\u data\u id=training\u data\u repository.get\u all\u id()
context=ExecutionContext('caching\u folder')。设置\u服务\u定位器({
BaseRepository:training_data_repository
})
管道=管道([
ConvertIDsToLoadedData().assert\u拥有\u服务(BaseRepository),
柱状变压器([
(范围(0,2),DateToCosineEncoder()),
(3,categories(categories_count=5,从_zero=True)开始),
]),
Normalizer(),
TrainOnlyRapper(DataShuffler()),
小型顺序管道([
模型()
],批次大小=128)
]).带有上下文(上下文)

但是,没有显示如何实现
BaseRepository
ConvertIDsToLoadedData
类。实现这些类的最佳方式是什么?有人能举个例子吗?

我没有检查下面的编译是否正确,但它应该是下面的样子。如果您发现要更改的内容并试图编译,请有人编辑此答案:

class BaseDataRepository(ABC): 

    @abstractmethod
    def get_all_ids(self) -> List[int]: 
        pass

    @abstractmethod
    def get_data_from_id(self, _id: int) -> object: 
        pass

class InMemoryDataRepository(BaseDataRepository): 
    def __init__(self, ids, data): 
        self.ids: List[int] = ids
        self.data: Dict[int, object] = data

    def get_all_ids(self) -> List[int]: 
        return list(self.ids)

    def get_data_from_id(self, _id: int) -> object: 
        return self.data[_id]

class ConvertIDsToLoadedData(BaseStep): 
    def _handle_transform(self, data_container: DataContainer, context: ExecutionContext): 
        repo: BaseDataRepository = context.get_service(BaseDataRepository)
        ids = data_container.data_inputs

        # Replace data ids by their loaded object counterpart: 
        data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]

        return data_container, context

context = ExecutionContext('caching_folder').set_service_locator({
    BaseDataRepository: InMemoryDataRepository(ids, data)  # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub. 
})

有关更新,请参阅我在此处为这个问题打开的问题:

我没有检查以下编译是否正确,但它应该与下面的内容类似。如果您发现要更改的内容并试图编译,请有人编辑此答案:

class BaseDataRepository(ABC): 

    @abstractmethod
    def get_all_ids(self) -> List[int]: 
        pass

    @abstractmethod
    def get_data_from_id(self, _id: int) -> object: 
        pass

class InMemoryDataRepository(BaseDataRepository): 
    def __init__(self, ids, data): 
        self.ids: List[int] = ids
        self.data: Dict[int, object] = data

    def get_all_ids(self) -> List[int]: 
        return list(self.ids)

    def get_data_from_id(self, _id: int) -> object: 
        return self.data[_id]

class ConvertIDsToLoadedData(BaseStep): 
    def _handle_transform(self, data_container: DataContainer, context: ExecutionContext): 
        repo: BaseDataRepository = context.get_service(BaseDataRepository)
        ids = data_container.data_inputs

        # Replace data ids by their loaded object counterpart: 
        data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]

        return data_container, context

context = ExecutionContext('caching_folder').set_service_locator({
    BaseDataRepository: InMemoryDataRepository(ids, data)  # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub. 
})
有关更新,请参阅我在此处打开的问题: