Python 如何在石榴中加入先验信息？换句话说：石榴是否支持增量学习？_Python_Pomegranate

Python 如何在石榴中加入先验信息？换句话说：石榴是否支持增量学习？

python

Python 如何在石榴中加入先验信息？换句话说：石榴是否支持增量学习？,python,pomegranate,Python,Pomegranate,假设我使用石榴将模型与当时可用的数据相匹配。一旦有更多的数据进来，我想相应地更新模型。换句话说，使用石榴是否可以使用新数据更新现有模型，而不覆盖以前的参数？需要明确的是：我不是指核心外学习，因为我的问题涉及到在不同时间点可用的数据，而不是在单个时间点可用的内存数据太大以下是我尝试过的： >>> from pomegranate.distributions import BetaDistribution >>> # suppose a coin genera

假设我使用

石榴

将模型与当时可用的数据相匹配。一旦有更多的数据进来，我想相应地更新模型。换句话说，使用

石榴

是否可以使用新数据更新现有模型，而不覆盖以前的参数？需要明确的是：我不是指核心外学习，因为我的问题涉及到在不同时间点可用的数据，而不是在单个时间点可用的内存数据太大

以下是我尝试过的：

>>> from pomegranate.distributions import BetaDistribution

>>> # suppose a coin generated the following data, where 1 is head and 0 is tail
>>> data1 = [0, 0, 0, 1, 0, 1, 0, 1, 0, 0]

>>> # as usual, we fit a Beta distribution to infer the bias of the coin
>>> model = BetaDistribution(1, 1)
>>> model.summarize(data1)  # compute sufficient statistics

>>> # presume we have seen all the data available so far,
>>> # we can now estimate the parameters
>>> model.from_summaries()

>>> # this results in the following model (so far so good)
>>> model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        3.0,
        7.0
    ],
    "frozen" :false
}

>>> # now suppose the coin is flipped a few more times, getting the following data
>>> data2 = [0, 1, 0, 0, 1]

>>> # we would like to update the model parameters accordingly
>>> model.summarize(data2)

>>> # but this fits only data2, overriding the previous parameters
>>> model.from_summaries()
>>> model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        2.0,
        3.0
    ],
    "frozen" :false
}


>>> # however I want to get the result that corresponds to the following,
>>> # but ideally without having to "drag along" data1
>>> data3 = data1 + data2
>>> model.fit(data3)
>>> model  # this should be the final model
{
    "class" :"Distribution",
    "name" :"BetaDistribution",
    "parameters" :[
        5.0,
        10.0
    ],
    "frozen" :false
}

编辑：

问这个问题的另一种方式是：

石榴支持增量学习还是在线学习？基本上，我正在寻找类似于scikit learn
的partial_fit（）
的东西
考虑到石榴的支持，我觉得我忽略了什么。有什么帮助吗？
事实上，问题出在总结中。在Beta发行版中，它会：self.summaries=[0,0]
。总结中的所有方法都是破坏性的。它们将摘要替换为分布中的参数。总结可以随时更新，以获得更多的观察结果，但不能更新参数
我认为这是一个糟糕的设计。最好将它们视为观察值的累加器，并将参数视为派生的缓存值
如果您这样做：
model = BetaDistribution(1, 1)
model.summarize(data1)
model.summarize(data2)
model.from_summaries()
model

您会发现，它确实产生了与使用了model.summary（data1+data2）
相同的结果。
谢谢@dan-d！那么，您是说，到目前为止，石榴
不支持增量学习，对吗？我需要“拖拽”所有数据，并在当时可用的数据上重新安装全新的模型。这是因为它的设计很糟糕，from\u summaries
在计算参数时清除了摘要。只有明确调用clear\u summaries
才能做到这一点。修复该问题只需要从每个from_summaries
方法的末尾删除一些行。