Python sklearn.model_selection.GridSearchCV的最新DirichletAllocation评分策略_Python_Scikit Learn_Nlp_Lda

Python sklearn.model_selection.GridSearchCV的最新DirichletAllocation评分策略

python scikit-learn nlp

Python sklearn.model_selection.GridSearchCV的最新DirichletAllocation评分策略,python,scikit-learn,nlp,lda,Python,Scikit Learn,Nlp,Lda,我正在尝试使用sklearn库将GridSearchCV应用于最新的DirichletAllocation 目前的管道看起来是这样的：当前GridSearchCV使用近似对数似然度作为分数，以确定哪种模型是最佳模型。我想做的是将我的评分方法改为基于模型的评分根据sklearn的说法，我可以使用一个得分论点。然而，我不知道如何运用困惑作为一种评分方法，我也找不到任何在线应用的例子。这可能吗？GridSearchCV默认情况下将使用管道中最终估计器的score（）功能 make_scorer可以

我正在尝试使用sklearn库将GridSearchCV应用于最新的DirichletAllocation

目前的管道看起来是这样的：当前GridSearchCV使用近似对数似然度作为分数，以确定哪种模型是最佳模型。我想做的是将我的评分方法改为基于模型的评分

根据sklearn的说法，我可以使用一个得分论点。然而，我不知道如何运用困惑作为一种评分方法，我也找不到任何在线应用的例子。这可能吗？

GridSearchCV

默认情况下将使用管道中最终估计器的

score（）

功能

make_scorer

可以在这里使用，但是为了计算复杂度，您还需要来自拟合模型的其他数据，通过

make_scorer

提供这些数据可能有点复杂

您可以在此处对LDA进行包装，并在其中重新实现

score（）

函数以返回

困惑

。大致如下：

class MyLDAWithPerplexityScorer(LatentDirichletAllocation):

    def score(self, X, y=None):

        # You can change the options passed to perplexity here
        score = super(MyLDAWithPerplexityScorer, self).perplexity(X, sub_sampling=False)

        # Since perplexity is lower for better, so we do negative
        return -1*score

然后可以在您的代码中使用它代替

LatentDirichletAllocation

，如：

...
...
...
lda_model = MyLDAWithPerplexityScorer(n_components =number_of_topics,
                                ....
                                ....   
                                n_jobs = -1,            
                                )
...
...

分数和困惑度参数似乎有缺陷，并且取决于主题的数量。因此，网格中的结果将为您提供最少数量的主题

...
...
...
lda_model = MyLDAWithPerplexityScorer(n_components =number_of_topics,
                                ....
                                ....   
                                n_jobs = -1,            
                                )
...
...