Python 如何使用Neuraxe实现惰性数据加载的存储库？_Python_Machine Learning_Neuraxle

Python 如何使用Neuraxe实现惰性数据加载的存储库？

python machine-learning

Python 如何使用Neuraxe实现惰性数据加载的存储库？,python,machine-learning,neuraxle,Python,Machine Learning,Neuraxle,在中，显示了一个示例，使用存储库在管道中延迟加载数据，请参见以下代码：来自neuraxe.pipeline导入管道，MiniBatchSequentialPipeline 从neuraxe.base导入ExecutionContext 从neuraxe.steps.column\u transformer导入column transformer 从neuraxe.steps.flow导入列车仅限rapper training\u data\u id=training\u data\u repo

在中，显示了一个示例，使用存储库在管道中延迟加载数据，请参见以下代码：

来自neuraxe.pipeline导入管道，MiniBatchSequentialPipeline
从neuraxe.base导入ExecutionContext
从neuraxe.steps.column\u transformer导入column transformer
从neuraxe.steps.flow导入列车仅限rapper
training\u data\u id=training\u data\u repository.get\u all\u id（）
context=ExecutionContext（'caching\u folder'）。设置\u服务\u定位器({
BaseRepository:training_data_repository
})
管道=管道([
ConvertIDsToLoadedData（）.assert\u拥有\u服务（BaseRepository），
柱状变压器([
（范围（0,2），DateToCosineEncoder（）），
（3，categories（categories_count=5，从_zero=True）开始），
]),
Normalizer（），
TrainOnlyRapper（DataShuffler（）），
小型顺序管道([
模型（）
]，批次大小=128）
]).带有上下文（上下文）

但是，没有显示如何实现

BaseRepository

和

ConvertIDsToLoadedData

类。实现这些类的最佳方式是什么？有人能举个例子吗？

我没有检查下面的编译是否正确，但它应该是下面的样子。如果您发现要更改的内容并试图编译，请有人编辑此答案：

class BaseDataRepository(ABC): 

    @abstractmethod
    def get_all_ids(self) -> List[int]: 
        pass

    @abstractmethod
    def get_data_from_id(self, _id: int) -> object: 
        pass

class InMemoryDataRepository(BaseDataRepository): 
    def __init__(self, ids, data): 
        self.ids: List[int] = ids
        self.data: Dict[int, object] = data

    def get_all_ids(self) -> List[int]: 
        return list(self.ids)

    def get_data_from_id(self, _id: int) -> object: 
        return self.data[_id]

class ConvertIDsToLoadedData(BaseStep): 
    def _handle_transform(self, data_container: DataContainer, context: ExecutionContext): 
        repo: BaseDataRepository = context.get_service(BaseDataRepository)
        ids = data_container.data_inputs

        # Replace data ids by their loaded object counterpart: 
        data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]

        return data_container, context

context = ExecutionContext('caching_folder').set_service_locator({
    BaseDataRepository: InMemoryDataRepository(ids, data)  # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub. 
})

有关更新，请参阅我在此处为这个问题打开的问题：

我没有检查以下编译是否正确，但它应该与下面的内容类似。如果您发现要更改的内容并试图编译，请有人编辑此答案：

class BaseDataRepository(ABC): 

    @abstractmethod
    def get_all_ids(self) -> List[int]: 
        pass

    @abstractmethod
    def get_data_from_id(self, _id: int) -> object: 
        pass

class InMemoryDataRepository(BaseDataRepository): 
    def __init__(self, ids, data): 
        self.ids: List[int] = ids
        self.data: Dict[int, object] = data

    def get_all_ids(self) -> List[int]: 
        return list(self.ids)

    def get_data_from_id(self, _id: int) -> object: 
        return self.data[_id]

class ConvertIDsToLoadedData(BaseStep): 
    def _handle_transform(self, data_container: DataContainer, context: ExecutionContext): 
        repo: BaseDataRepository = context.get_service(BaseDataRepository)
        ids = data_container.data_inputs

        # Replace data ids by their loaded object counterpart: 
        data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]

        return data_container, context

context = ExecutionContext('caching_folder').set_service_locator({
    BaseDataRepository: InMemoryDataRepository(ids, data)  # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub. 
})

有关更新，请参阅我在此处打开的问题：