pythonscikit中基于用户和项目的数据基本过滤_Python_Scikit Learn_Collaborative Filtering_Recommender Systems

pythonscikit中基于用户和项目的数据基本过滤

python scikit-learn

pythonscikit中基于用户和项目的数据基本过滤,python,scikit-learn,collaborative-filtering,recommender-systems,Python,Scikit Learn,Collaborative Filtering,Recommender Systems,我正试图实现一个推荐系统，以用户的评级为基础。我认为最常见的一个。我读了很多和入围惊喜，一个基于python scikit的推荐系统虽然我能够导入数据并运行预测，但这并不是我想要的现在我所拥有的：我可以传递一个用户id、项目id和评级，并获得该用户给出我通过的评级的概率我真正想做的是：传递一个用户id，作为回报，获得一个列表，其中的项目可能会受到该用户基于数据的喜爱/评价 from surprise import Reader, Dataset from surprise impo

我正试图实现一个推荐系统，以用户的评级为基础。我认为最常见的一个。我读了很多和入围惊喜，一个基于python scikit的推荐系统

虽然我能够导入数据并运行预测，但这并不是我想要的

现在我所拥有的：我可以传递一个用户id、项目id和评级，并获得该用户给出我通过的评级的概率

我真正想做的是：传递一个用户id，作为回报，获得一个列表，其中的项目可能会受到该用户基于数据的喜爱/评价

from surprise import Reader, Dataset    
from surprise import SVD, evaluate

# Define the format
reader = Reader(line_format='user item rating timestamp', sep='\t')
# Load the data from the file using the reader format
data = Dataset.load_from_file('./data/ecomm/e.data', reader=reader)    

# Split data into 5 folds
data.split(n_folds=5)

algo = SVD()

# Retrieve the trainset.
trainset = data.build_full_trainset()
algo.fit(trainset)

//Inputs are: user_id, item_id & rating.
print algo.predict(3, 107, 1)

数据文件中的采样线

第一列是用户id，第二列是项目id，第三列是评级，然后是时间戳

您需要遍历单个用户id的所有可能的item_id值，并预测其评级。然后收集最高评分的项目以推荐给该用户

但请确保用户id、项目id对不在培训数据集中。比如：

构建反测试集

返回可编辑的评级列表在测试方法中用作测试集

额定值是不在列车组中的所有额定值，即用户u已知，项目i已知，但额定值rui不在车列中。因为鲁伊不为人所知，所以也不为人所知替换为填充值或假定等于所有值的平均值你的意思是什么

之后，您可以将这些对传递给test或predict方法并收集评级，然后从该数据中为特定用户获得前N个建议

这里给出了一个例子：

196 242 3   881250949
186 302 3   891717742
22  377 1   878887116
244 51  2   880606923
166 346 1   886397596
298 474 4   884182806
115 265 2   881171488
253 465 5   891628467
305 451 3   886324817
6   86  3   883603013