Java 通过使用ApacheMahout，根据用户偏好向其他用户推荐用户_Java_Mahout_Recommendation Engine_Mahout Recommender

Java 通过使用ApacheMahout，根据用户偏好向其他用户推荐用户

java

Java 通过使用ApacheMahout，根据用户偏好向其他用户推荐用户,java,mahout,recommendation-engine,mahout-recommender,Java,Mahout,Recommendation Engine,Mahout Recommender,这是我在stackoverflow.com上的第一个问题，如果我犯了任何错误，请道歉现在，我正试图使用ApacheMahout在java中创建一个推荐引擎。我有一个如下所示的输入文件（当然要大得多）：我想为每个用户做的是，我想根据他们对项目的评分推荐一些其他用户。比方说，在我的程序结束时，输出将是 userID1 similar to UserID2 with score of 0.8 (This score could be a value between 0 and 1 or a p

这是我在stackoverflow.com上的第一个问题，如果我犯了任何错误，请道歉

现在，我正试图使用ApacheMahout在java中创建一个推荐引擎。我有一个如下所示的输入文件（当然要大得多）：

我想为每个用户做的是，我想根据他们对项目的评分推荐一些其他用户。比方说，在我的程序结束时，输出将是

userID1  similar to UserID2  with score of 0.8 (This score could be a value between 0 and 1 or a percentage  only requirement is being reasonable)
userID1  similar to userID3  with score of 0.7
userID2  similar to UserID1  with score of 0.8
userID2  similar to userID4  with score of 0.5
userID3  similar to userID1  with score of 0.7
userID4  similar to userID2  with score of 0.5

等等。为此，我编写了以下代码

public void RecommenderFunction()
{
        DataModel model = new FileDataModel(new File("data/dataset.csv")); 
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0, similarity, model);
        UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
        {
            long userId=users.nextLong();
            long[] recommendedUserIDs=recommender.mostSimilarUserIDs(userId, 100); // I want to find all similarUserIDs not a subset of it.Thats why I put 100 as a second argument.

            for(long recID:recommendedUserIDs)
            {
                System.out.println("user:"+userId+" similar with:"+recID);
            }

        }


}

这是我的

dataset.csv

文件

1,10,1.0
1,11,2.0
1,12,5.0
1,13,5.0
1,14,5.0
1,15,4.0
1,16,5.0
1,17,1.0
1,18,5.0
2,10,1.0
2,11,2.0
2,15,5.0
2,16,4.5
2,17,1.0
2,18,5.0
3,11,2.5
3,12,4.5
3,13,4.0
3,14,3.0
3,15,3.5
3,16,4.5
3,17,4.0
3,18,5.0
4,10,5.0
4,11,5.0
4,12,5.0
4,13,0.0
4,14,2.0
4,15,3.0
4,16,1.0
4,17,4.0
4,18,1.0

这是我为这个数据集编写的程序的结果：

user:1 similar with:2
user:1 similar with:3
user:1 similar with:4
user:2 similar with:1
user:2 similar with:3
user:2 similar with:4
user:3 similar with:2
user:3 similar with:1
user:3 similar with:4
user:4 similar with:3
user:4 similar with:1
user:4 similar with:2

我知道，因为我把100作为上述函数的第二个参数，所以recommender会返回彼此相似的所有用户对。我的问题从这里开始。我的程序能够告诉我哪些用户彼此相似。然而，我找不到一种方法来获得它们的相似性分数。我怎么能这么做

编辑

我认为，皮尔逊系数相似性结果可以用来验证建议。我的逻辑错了吗？我的意思是，我用以下方式修改了上面的代码：

 public void RecommenderFunction()
    {
        // same as above.
            for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
            {
                // same as above.

                for(long recID:recommendedUserIDs)
                {
                    // confidence score of recommendation is the pearson correlation score of two users. Am I wrong?
                    System.out.println("user:"+userId+" similar with:"+recID+" score of: "+similarity.userSimilarity(userId, recID));
                }

            }


    }

这是一个良好的开端。请记住，用户相似度值用于创建项目推荐，因此您不能再次使用相似度分数来验证推荐质量。现在您已经有了用户相似性分数，可以使用Mahout为所有用户生成项目建议。当你成功的时候，你可以通过对你的推荐人隐藏一些数据来测试你的推荐的质量，看看它对那些隐藏的评分的预测，然后测量预测的接近程度。这是推荐人评估的一种形式（在许多形式中），称为预测准确性。常用的度量是RMSE，即均方根误差。有了这样一个指标，您就可以看到推荐人的表现有多好。

这是一个好的开始。请记住，用户相似度值用于创建项目推荐，因此您不能再次使用相似度分数来验证推荐质量。现在您已经有了用户相似性分数，可以使用Mahout为所有用户生成项目建议。当你成功的时候，你可以通过对你的推荐人隐藏一些数据来测试你的推荐的质量，看看它对那些隐藏的评分的预测，然后测量预测的接近程度。这是推荐人评估的一种形式（在许多形式中），称为预测准确性。常用的度量是RMSE，即均方根误差。使用这样的指标，您将能够看到推荐人的表现。

欢迎使用SO！：）请，拿一个去拿你的第一个闪亮徽章：）欢迎来到SO！：）请拿一个来拿你的第一个闪亮徽章：）

 public void RecommenderFunction()
    {
        // same as above.
            for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
            {
                // same as above.

                for(long recID:recommendedUserIDs)
                {
                    // confidence score of recommendation is the pearson correlation score of two users. Am I wrong?
                    System.out.println("user:"+userId+" similar with:"+recID+" score of: "+similarity.userSimilarity(userId, recID));
                }

            }


    }