Python 如何使用Neuraxe实现惰性数据加载的存储库?
在中,显示了一个示例,使用存储库在管道中延迟加载数据,请参见以下代码:Python 如何使用Neuraxe实现惰性数据加载的存储库?,python,machine-learning,neuraxle,Python,Machine Learning,Neuraxle,在中,显示了一个示例,使用存储库在管道中延迟加载数据,请参见以下代码: 来自neuraxe.pipeline导入管道,MiniBatchSequentialPipeline 从neuraxe.base导入ExecutionContext 从neuraxe.steps.column\u transformer导入column transformer 从neuraxe.steps.flow导入列车仅限rapper training\u data\u id=training\u data\u repo
来自neuraxe.pipeline导入管道,MiniBatchSequentialPipeline
从neuraxe.base导入ExecutionContext
从neuraxe.steps.column\u transformer导入column transformer
从neuraxe.steps.flow导入列车仅限rapper
training\u data\u id=training\u data\u repository.get\u all\u id()
context=ExecutionContext('caching\u folder')。设置\u服务\u定位器({
BaseRepository:training_data_repository
})
管道=管道([
ConvertIDsToLoadedData().assert\u拥有\u服务(BaseRepository),
柱状变压器([
(范围(0,2),DateToCosineEncoder()),
(3,categories(categories_count=5,从_zero=True)开始),
]),
Normalizer(),
TrainOnlyRapper(DataShuffler()),
小型顺序管道([
模型()
],批次大小=128)
]).带有上下文(上下文)
但是,没有显示如何实现
BaseRepository
和ConvertIDsToLoadedData
类。实现这些类的最佳方式是什么?有人能举个例子吗?我没有检查下面的编译是否正确,但它应该是下面的样子。如果您发现要更改的内容并试图编译,请有人编辑此答案:
class BaseDataRepository(ABC):
@abstractmethod
def get_all_ids(self) -> List[int]:
pass
@abstractmethod
def get_data_from_id(self, _id: int) -> object:
pass
class InMemoryDataRepository(BaseDataRepository):
def __init__(self, ids, data):
self.ids: List[int] = ids
self.data: Dict[int, object] = data
def get_all_ids(self) -> List[int]:
return list(self.ids)
def get_data_from_id(self, _id: int) -> object:
return self.data[_id]
class ConvertIDsToLoadedData(BaseStep):
def _handle_transform(self, data_container: DataContainer, context: ExecutionContext):
repo: BaseDataRepository = context.get_service(BaseDataRepository)
ids = data_container.data_inputs
# Replace data ids by their loaded object counterpart:
data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]
return data_container, context
context = ExecutionContext('caching_folder').set_service_locator({
BaseDataRepository: InMemoryDataRepository(ids, data) # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub.
})
有关更新,请参阅我在此处为这个问题打开的问题:我没有检查以下编译是否正确,但它应该与下面的内容类似。如果您发现要更改的内容并试图编译,请有人编辑此答案:
class BaseDataRepository(ABC):
@abstractmethod
def get_all_ids(self) -> List[int]:
pass
@abstractmethod
def get_data_from_id(self, _id: int) -> object:
pass
class InMemoryDataRepository(BaseDataRepository):
def __init__(self, ids, data):
self.ids: List[int] = ids
self.data: Dict[int, object] = data
def get_all_ids(self) -> List[int]:
return list(self.ids)
def get_data_from_id(self, _id: int) -> object:
return self.data[_id]
class ConvertIDsToLoadedData(BaseStep):
def _handle_transform(self, data_container: DataContainer, context: ExecutionContext):
repo: BaseDataRepository = context.get_service(BaseDataRepository)
ids = data_container.data_inputs
# Replace data ids by their loaded object counterpart:
data_container.data_inputs = [repo.get_data_from_id(_id) for _id in ids]
return data_container, context
context = ExecutionContext('caching_folder').set_service_locator({
BaseDataRepository: InMemoryDataRepository(ids, data) # or insert here any other replacement class that inherits from `BaseDataRepository` when you'll change the database to a real one (e.g.: SQL) rather than a cheap "InMemory" stub.
})
有关更新,请参阅我在此处打开的问题: