Python SVD脚本,用于机器学习的格式矩阵

Python SVD脚本,用于机器学习的格式矩阵,python,matrix,machine-learning,svd,Python,Matrix,Machine Learning,Svd,我正在使用Movielens数据集。ratings.dat/csv格式为 用户ID、电影ID、分级、时间戳 1,1,5.052234234 初始数据集: user movie rating 1 43 3 1 57 2 2 219 4 需要重点关注: user 1 2 movie 43 3 0 57 2 0 219 0 4

我正在使用Movielens数据集。ratings.dat/csv格式为

  • 用户ID、电影ID、分级、时间戳
  • 1,1,5.052234234
初始数据集:

 user    movie   rating
    1       43      3
    1       57      2
    2       219     4
需要重点关注:

user        1   2
movie   43  3   0
        57  2   0
        219 0   4
为了提出建议,矩阵需要在行(用户)列(movieId)中,以便检查相似性。如本教程所示:

我得到的输出如下:

8003 636e 756d 7079 2e63 6f72 652e 6d75
6c74 6961 7272 6179 0a5f 7265 636f 6e73 
7472 7563 740a 7100 636e 756d 7079 0a6e 
6461 7272 6179 0a71 014b 0085 7102 4301
6271 0387 7104 5271 0528 4b01 4dce 024d  
...
...
据我所知,为了检查相似性,然后提出建议,我需要一个矩阵,其中第一行(userId=“1”) 每部电影有0-5(评级)值

python脚本(我使用了.dat和.csv文件):


你的问题是什么?:)输出不是我所描述的,也不是教程在图片中显示的。评分有0-5个值更新问题您的问题是什么?:)输出不是我所描述的,也不是教程在图片中显示的。评分值为0-5。回答问题
import pandas as pd
import numpy as np
import scipy.sparse as sp
from scipy.sparse.linalg import svds
import pickle
data_file = pd.read_table(r'rat.csv', sep = ',', header=None,engine='python')
users = np.unique(data_file[0])
movies = np.unique(data_file[1])

number_of_rows = len(users)
number_of_columns = len(movies)

movie_indices, user_indices = {}, {}

for i in range(len(movies)):
    movie_indices[movies[i]] = i

for i in range(len(users)):
    user_indices[users[i]] = i
    #scipy sparse matrix to store the 1M matrix
V = sp.lil_matrix((number_of_rows, number_of_columns))

#adds data into the sparse matrix
for line in data_file.values:
    u, i , r , gona = map(int,line)
    V[user_indices[u], movie_indices[i]] = r
    #as these operations consume a lot of time, it's better to save processed data 
with open('movielens_1M.pickle', 'wb') as handle:
    pickle.dump(V, handle)
    #as these operations consume a lot of time, it's better to save processed data 
#gets SVD components from 10M matrix
u,s, vt = svds(V, k = 10)

with open('movielens_1M_svd_u.pickle', 'wb') as handle:
    pickle.dump(u, handle)
with open('movielens_1M_svd_s.pickle', 'wb') as handle:
    pickle.dump(s, handle)
with open('movielens_1M_svd_vt.pickle', 'wb') as handle:
    pickle.dump(vt, handle)
    s_diag_matrix = np.zeros((s.shape[0], s.shape[0]))

for i in range(s.shape[0]):
    s_diag_matrix[i,i] = s[i]
    X_lr = np.dot(np.dot(u, s_diag_matrix), vt)

with open('movielens.pickle', 'wb') as handle:
    pickle.dump(X_lr, handle)