Sql 存在多对多关系
我正在使用Amazon Redshift,无法正确获取查询。假设我有很多项目,每个项目都需要一个或多个技能。因此,项目和技能之间存在着多对多的关系。用户创建项目,因此每个项目有一个创建者 对于每个项目,我想让所有其他项目都由至少共享一项技能的同一个创建者完成。所以我想写这样的东西:Sql 存在多对多关系,sql,amazon-redshift,Sql,Amazon Redshift,我正在使用Amazon Redshift,无法正确获取查询。假设我有很多项目,每个项目都需要一个或多个技能。因此,项目和技能之间存在着多对多的关系。用户创建项目,因此每个项目有一个创建者 对于每个项目,我想让所有其他项目都由至少共享一项技能的同一个创建者完成。所以我想写这样的东西: SELECT p1.project_id, p2.project_id FROM projects p1 JOIN projects p2 on p1.creator = p2.creator WHERE EXIS
SELECT p1.project_id, p2.project_id
FROM projects p1
JOIN projects p2 on p1.creator = p2.creator
WHERE EXISTS (SELECT 0
from skills sk1, skills sk2
where sk1.project_id = p1.project_id
and sk2.project_id = p2.project_id
and sk1.skill = sk2.skill)
SELECT distinct p1.project_id, p2.project_id
FROM projects p1
JOIN projects p2 on
p1.creator = p2.creator and
p2.project_id > p1.project_id
join skills sk1 on
sk1.project_id = p1.project_id
join skills sk2 on
sk2.project_id = p2.project_id and
sk1.skill = sk2.skill
问题是这确实是(给出一个磁盘已满
错误)
以下方法可行,但速度也很慢(需要半小时左右):
这个问题是,如果我想在第二个项目上聚合一些属性,我必须将其用作子查询
有更好的方法吗?我原以为第一个查询会更快,因为它只限于挑选一个项目 查询中的一个简单问题是,您允许项目加入自身。这意味着每个项目都将被返回 通过确保连接的两个项目不相同来解决此问题:
JOIN projects p2 on
p1.creator = p2.creator and
p2.project_id > p1.project_id
请注意,我使用的是
,而不是=代码>以便两个匹配的项目仅在一个方向上连接。否则,每对项目将返回两次
然后,基于加入的解决方案将如下所示:
SELECT p1.project_id, p2.project_id
FROM projects p1
JOIN projects p2 on p1.creator = p2.creator
WHERE EXISTS (SELECT 0
from skills sk1, skills sk2
where sk1.project_id = p1.project_id
and sk2.project_id = p2.project_id
and sk1.skill = sk2.skill)
SELECT distinct p1.project_id, p2.project_id
FROM projects p1
JOIN projects p2 on
p1.creator = p2.creator and
p2.project_id > p1.project_id
join skills sk1 on
sk1.project_id = p1.project_id
join skills sk2 on
sk2.project_id = p2.project_id and
sk1.skill = sk2.skill
在连接列上是否设置了索引?