SQL统计推荐系统的好恶,基于用户的协同过滤

SQL统计推荐系统的好恶,基于用户的协同过滤,sql,vector,recommendation-engine,collaborative-filtering,Sql,Vector,Recommendation Engine,Collaborative Filtering,这个想法是用户为不同的项目留下好恶,我需要获得一个用户列表,这些用户的好恶评分与所选用户相同,以确定他们的相似性 RATING Column: 1 = like, 0 = dislike 全表: +---------+---------+--------+--------------------------------------------------+ | USER_ID | ITEM_ID | RATING | -EXAMPLE-

这个想法是用户为不同的项目留下好恶,我需要获得一个用户列表,这些用户的好恶评分与所选用户相同,以确定他们的相似性

RATING Column:
1 = like,
0 = dislike
全表:

+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING |                      -EXAMPLE-                   |
+---------+---------+--------+--------------------------------------------------+
|       1 |       1 |      1 |-+
|       1 |       2 |      1 | |
|       1 |       3 |      1 | +-[1,1,1,0,0] user_1 vector of ratings
|       1 |       4 |      0 | |  |     | | 
|       1 |       5 |      0 |-+  |     | |     
|       3 |       1 |      1 |----+     + + total_match with user_1 = 3 [1,0,0]
|       3 |       2 |      0 |          | |        
|       3 |       3 |      0 |          | |       
|       3 |       4 |      0 |----------+ |
|       3 |       5 |      0 |------------+
|       4 |       1 |      1 |
|       4 |       2 |      1 |
|       4 |       3 |      1 |
|       4 |       4 |      0 |
|       4 |       5 |      0 |
+---------+---------+--------+
匹配计算:

user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3
如何进行SQL查询以获得以下结果:

+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
|       3 |           1 |              2 |           3 |
|       4 |           3 |              2 |           5 |
+---------+-------------+----------------+-------------+

有什么想法吗?

您可能需要多个子查询才能获得所需的结果,请参见下面的代码:

select  res1.user_id,
        sum(res1.likes_match1) as likes_match,
        sum(res1.dislikes_match1) as dislikes_match,
        sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
  from(
select res.user_id, 
case 
     when res.rating=1 then count(res.rating)
     else 0
 end as likes_match1,
case 
     when res.rating=0 then count(res.rating) 
     else 0
 end as dislikes_match1
 from
(
select b.user_id as user_id, 
case
       when a.rating=1 and b.rating=1 then 1
       else 0
  end as rating
from have a 
inner join have b
   on a.item_id=b.item_id 
  and a.user_id=1 
  and b.user_id <>1
  and a.rating=b.rating
) as res
group by res.user_id, res.rating) as res1
group by res1.user_id
;

您可能需要多个子查询才能获得所需的结果,请参见下面的代码:

select  res1.user_id,
        sum(res1.likes_match1) as likes_match,
        sum(res1.dislikes_match1) as dislikes_match,
        sum(res1.likes_match1)+sum(res1.dislikes_match1) as total_match
  from(
select res.user_id, 
case 
     when res.rating=1 then count(res.rating)
     else 0
 end as likes_match1,
case 
     when res.rating=0 then count(res.rating) 
     else 0
 end as dislikes_match1
 from
(
select b.user_id as user_id, 
case
       when a.rating=1 and b.rating=1 then 1
       else 0
  end as rating
from have a 
inner join have b
   on a.item_id=b.item_id 
  and a.user_id=1 
  and b.user_id <>1
  and a.rating=b.rating
) as res
group by res.user_id, res.rating) as res1
group by res1.user_id
;

它使用sqlite,但在其他数据库上工作不需要太多:

鉴于下表:

CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
                   , PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);
此查询:

SELECT r1.user_id AS user_id
     , sum(r1.rating) AS likes_match
     , sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
     , count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
                  AND r1.item_id = r2.item_id
                  AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;

它使用sqlite,但在其他数据库上工作不需要太多:

鉴于下表:

CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
                   , PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
INSERT INTO ratings VALUES(1,1,1);
INSERT INTO ratings VALUES(1,2,1);
INSERT INTO ratings VALUES(1,3,1);
INSERT INTO ratings VALUES(1,4,0);
INSERT INTO ratings VALUES(1,5,0);
INSERT INTO ratings VALUES(3,1,1);
INSERT INTO ratings VALUES(3,2,0);
INSERT INTO ratings VALUES(3,3,0);
INSERT INTO ratings VALUES(3,4,0);
INSERT INTO ratings VALUES(3,5,0);
INSERT INTO ratings VALUES(4,1,1);
INSERT INTO ratings VALUES(4,2,1);
INSERT INTO ratings VALUES(4,3,1);
INSERT INTO ratings VALUES(4,4,0);
INSERT INTO ratings VALUES(4,5,0);
此查询:

SELECT r1.user_id AS user_id
     , sum(r1.rating) AS likes_match
     , sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
     , count(*) AS total_match
FROM ratings AS r1
JOIN ratings AS r2 ON r2.user_id = 1
                  AND r1.item_id = r2.item_id
                  AND r1.rating = r2.rating
WHERE r1.user_id <> 1
GROUP BY r1.user_id
ORDER BY r1.user_id;

您熟悉自联接的概念吗?您熟悉自联接的概念吗?