Python Sklearn标签编码器反变换与意外预测_Python_Machine Learning_Categorization

Python Sklearn标签编码器反变换与意外预测

python machine-learning

Python Sklearn标签编码器反变换与意外预测,python,machine-learning,categorization,Python,Machine Learning,Categorization,我有一个数据集，我正在使用标签编码器对数据进行分类（从字符串到数字）然后，我使用惊奇库来训练推荐系统模型我使用以下代码获得预测： # A reader is still needed but only the rating_scale param is requiered. reader = Reader(rating_scale=(1, 100)) # The columns must correspond to user id, item id and ratings (in that

我有一个数据集，我正在使用标签编码器对数据进行分类（从字符串到数字）

然后，我使用惊奇库来训练推荐系统模型

我使用以下代码获得预测：

# A reader is still needed but only the rating_scale param is requiered.
reader = Reader(rating_scale=(1, 100))

# The columns must correspond to user id, item id and ratings (in that order).
data = Dataset.load_from_df(df_categorized, reader)

# First train an SVD algorithm on the dataset.
trainset = data.build_full_trainset()
algo = SVD()
algo.fit(trainset)

# Than predict ratings for all pairs (u, i) that are NOT in the training set.
testset = trainset.build_anti_testset()
predictions = algo.test(testset)

top_n = get_top_n(predictions, n=5)

我使用了top_n方法，得到了如下的top_n建议：

def get_top_n(predictions, n=5):
    '''Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    '''

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

由于首先使用标签编码器对数据进行分类，因此顶部将userId作为编码值，并将推荐内容作为编码值

如何将数据反变换回未编码的值

我尝试从top_n（userId encoded）获取返回的键，并使用标签编码器逆变换方法，但是，由于它在dict中，因此没有该方法工作所需的索引。推荐的内容也是如此，这些内容存储为目录中的值。

您能分享您的建议吗output@VivekKumar，{userIdCategorized:[contentIdCategorized…]，…}是输出的形式，其中键是userId的编码版本，值是推荐的contentIdCategorized列表。