使用MySQL查询IMDB数据库

使用MySQL查询IMDB数据库,mysql,imdb,Mysql,Imdb,我编写了一个SQL查询来回答以下问题: 在IMBD数据库中查找所有与Yash Chopra合作制作的电影比其他任何导演都多的演员 示例架构: person (pid * ,name ); m_cast (mid * ,pid * ); m_director (mid* ,pid* ); * = (component of) PRIMARY KEY 以下是我的疑问: WITH common_actors AS (SELECT A.actor_id as actors, B

我编写了一个SQL查询来回答以下问题:

在IMBD数据库中查找所有与Yash Chopra合作制作的电影比其他任何导演都多的演员

示例架构:

person
(pid *
,name
);

m_cast
(mid *
,pid *
);

m_director
(mid*
,pid*
);

* = (component of) PRIMARY KEY
以下是我的疑问:

WITH common_actors AS 
        (SELECT A.actor_id as actors, B.director_id as director_id, B.movies as movies_with_director,
        B.director_id as yash_chops_id, B.movies as movies_with_yash_chops FROM
        (SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
        left join M_Director  
        ON M_Cast.MID = M_Director.MID
        GROUP BY actor_id, director_id) A
        JOIN
        (SELECT M_Cast.PID as actor_id, M_Director.PID as director_id, COUNT(*) as movies from M_Cast
        left join M_Director  
        ON M_Cast.MID = M_Director.MID
        GROUP BY actor_id, director_id
        )B
        ON A.actor_id = B.actor_id
        WHERE B.director_id in (SELECT PID FROM Person WHERE Name LIKE 
        '%Yash%Chopra%'))

SELECT distinct actors as actor_id, movies_with_yash_chops as total_movies FROM common_actors
    WHERE actors NOT IN (SELECT actors FROM common_actors WHERE movies_with_director > movies_with_yash_chops)
从中得到的结果是长度:430行。但是,获得的结果应为243行。谁能告诉我我的问题哪里出错了?我的方法正确吗

样本结果:

    Actor name
  0 Sharib Hashmi
  1 Kulbir Badesron
  2 Gurdas Maan
  3 Parikshat Sahni
...
242 Ramlal Shyamlal

提前谢谢

考虑以下几点:

DROP TABLE IF EXISTS person;

CREATE TABLE person
(person_id SERIAL PRIMARY KEY
,name VARCHAR(20) NOT NULL UNIQUE
);

DROP TABLE IF EXISTS movie;

CREATE TABLE movie
(movie_id SERIAL PRIMARY KEY
,title VARCHAR(50) NOT NULL UNIQUE
);

DROP TABLE IF EXISTS m_cast;

CREATE TABLE m_cast
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);

DROP TABLE IF EXISTS m_director;

CREATE TABLE m_director
(movie_id INT NOT NULL
,person_id INT NOT NULL
,PRIMARY KEY(movie_id,person_id)
);

INSERT INTO person (name) VALUES
('Steven Feelberg'),
('Manly Kubrick'),
('Alfred Spatchcock'),
('Fred Pitt'),
('Raphael DiMaggio'),
('Bill Smith');

INSERT INTO movie VALUES
(1,'Feelberg\'s Movie with Fred & Raph'),
(2,'Feelberg and Fred Ride Again'),
(3,'Kubrick shoots DiMaggio'),
(4,'Kubrick\'s Movie with Bill Smith'),
(5,'Spatchcock Presents Bill Smith');

INSERT INTO m_director VALUES
(1,1),
(2,1),
(3,2),
(4,2),
(5,3);

INSERT INTO m_cast VALUES
(1,4),
(1,5),
(2,4),
(3,5),
(4,6),
(5,6);
我把电影表放进去只是为了便于参考。这与实际问题无关。 此外,请注意,此模型假设演员只列出一次,而不管他们在给定电影中是否有多个角色

下面的查询询问“每个演员和导演多久一起工作一次”

演员是任何一部电影的演员。 导演是指任何一部电影的导演

SELECT a.name actor
     , d.name director
     , COUNT(DISTINCT ma.movie_id) total
  FROM person d
  JOIN m_director md 
    ON md.person_id = d.person_id
  JOIN person a
  LEFT
  JOIN m_cast ma 
    ON ma.person_id = a.person_id
   AND ma.movie_id = md.movie_id
  JOIN m_cast x
    ON x.person_id = a.person_id
 GROUP
    BY actor
     , director;
     
+-------------------+-------------------+-------+
| actor             | director          | total |
+-------------------+-------------------+-------+
| Fred Pitt         | Alfred Spatchcock |     0 |
| Fred Pitt         | Manly Kubrick     |     0 |
| Fred Pitt         | Steven Feelberg   |     2 |
| Raphael DiMaggio  | Alfred Spatchcock |     0 |
| Raphael DiMaggio  | Manly Kubrick     |     1 |
| Raphael DiMaggio  | Steven Feelberg   |     1 |
| Bill Smith        | Alfred Spatchcock |     1 |
| Bill Smith        | Manly Kubrick     |     1 |
| Bill Smith        | Steven Feelberg   |     0 |
+-------------------+-------------------+-------+
通过观察,我们可以看到:

唯一一位与费尔伯格合作次数最多的演员是弗雷德·普里特 拉斐尔·迪卡普里奥(Raphael DiCaprio)和比尔·史密斯(Bill Smith)都曾与两位董事进行过同样频繁的合作,尽管他们的董事不同 编辑:虽然我不是认真地提倡将此作为一种解决方案,但下面只是简单地说明,上面提供的内核确实是解决问题所需的全部

SELECT x.*
  FROM 
     ( SELECT a.* 
 FROM 
    ( SELECT a.name actor
           , d.name director
           , COUNT(DISTINCT ma.movie_id) total
        FROM person d
        JOIN m_director md 
          ON md.person_id = d.person_id
        JOIN person a
        LEFT
        JOIN m_cast ma 
          ON ma.person_id = a.person_id
         AND ma.movie_id = md.movie_id
        JOIN m_cast x
          ON x.person_id = a.person_id
       GROUP
          BY actor
           , director
    ) a
 LEFT
 JOIN
    ( SELECT a.name actor
           , d.name director
           , COUNT(DISTINCT ma.movie_id) total
        FROM person d
        JOIN m_director md 
          ON md.person_id = d.person_id
        JOIN person a
        LEFT
        JOIN m_cast ma 
          ON ma.person_id = a.person_id
         AND ma.movie_id = md.movie_id
        JOIN m_cast x
          ON x.person_id = a.person_id
       GROUP
          BY actor
           , director
    ) b
   ON b.actor = a.actor
  AND b.director <> a.director 
  AND b.total > a.total
WHERE b.actor IS NULL
) x
LEFT JOIN
     ( SELECT a.* 
 FROM 
    ( SELECT a.name actor
           , d.name director
           , COUNT(DISTINCT ma.movie_id) total
        FROM person d
        JOIN m_director md 
          ON md.person_id = d.person_id
        JOIN person a
        LEFT
        JOIN m_cast ma 
          ON ma.person_id = a.person_id
         AND ma.movie_id = md.movie_id
        JOIN m_cast x
          ON x.person_id = a.person_id
       GROUP
          BY actor
           , director
    ) a
 LEFT
 JOIN
    ( SELECT a.name actor
           , d.name director
           , COUNT(DISTINCT ma.movie_id) total
        FROM person d
        JOIN m_director md 
          ON md.person_id = d.person_id
        JOIN person a
        LEFT
        JOIN m_cast ma 
          ON ma.person_id = a.person_id
         AND ma.movie_id = md.movie_id
        JOIN m_cast x
          ON x.person_id = a.person_id
       GROUP
          BY actor
           , director
    ) b
   ON b.actor = a.actor
  AND b.director <> a.director 
  AND b.total > a.total
WHERE b.actor IS NULL
) y
ON y.actor = x.actor AND y.director <> x.director
WHERE y.actor IS NULL;

+-----------+-----------------+-------+
| actor     | director        | total |
+-----------+-----------------+-------+
| Fred Pitt | Steven Feelberg |     2 |
+-----------+-----------------+-------+
这将返回每个演员的列表,以及他们最常合作的导演。在这种情况下,由于比尔·史密斯和拉斐尔·迪马吉奥通常与两位董事平起平坐,他们被排除在结果之外


你的问题的答案就是从这个列表中选择Yash Chopra列为导演的所有行。

嘿,谢谢你的回答,但是如果你看到我写的查询,我已经列出了每个可能的演员-导演组合及其电影总数,问题是,我真的很困惑,如何找到与某位导演合作的电影比其他任何导演都多的演员。在我的例子中,我自己加入了这个表,条件是正确的表中只有那个特定的导演,这样我就可以得到所有和其他导演一起制作更多电影的演员,并将他们过滤掉,因此剩下的演员将是我们的答案。我现在明白了!非常感谢你抽出时间回答这个问题。我如何接受这个答案对不起,我很笨,不过我没有投票!