Mysql 你问我的问题中的数据?给出一个简短的示例,即使它稍微扩展了我的示例作为指导。@DRapp——假设您有一个民意测验(1,“性别?”),有两个选项(1,“男性”;2,“女性”)。假设50%的用户回答1,50%的用户回答2。现在假设你有另一个民意调查(2,“你喜

Mysql 你问我的问题中的数据?给出一个简短的示例,即使它稍微扩展了我的示例作为指导。@DRapp——假设您有一个民意测验(1,“性别?”),有两个选项(1,“男性”;2,“女性”)。假设50%的用户回答1,50%的用户回答2。现在假设你有另一个民意调查(2,“你喜,mysql,query-optimization,Mysql,Query Optimization,你问我的问题中的数据?给出一个简短的示例,即使它稍微扩展了我的示例作为指导。@DRapp——假设您有一个民意测验(1,“性别?”),有两个选项(1,“男性”;2,“女性”)。假设50%的用户回答1,50%的用户回答2。现在假设你有另一个民意调查(2,“你喜欢足球吗?”),有两个选项(3,“是”;4,“否”)。现在,假设60%的用户回答3。但是在男性(回答1的人)中,75%回答3。偏差为15(75-60)。因此,基本上,给定一个特定的选项ID(在我上面的查询中,它是选项122),我们希望得到一个所


你问我的问题中的数据?给出一个简短的示例,即使它稍微扩展了我的示例作为指导。@DRapp——假设您有一个民意测验(1,“性别?”),有两个选项(1,“男性”;2,“女性”)。假设50%的用户回答1,50%的用户回答2。现在假设你有另一个民意调查(2,“你喜欢足球吗?”),有两个选项(3,“是”;4,“否”)。现在,假设60%的用户回答3。但是在男性(回答1的人)中,75%回答3。偏差为15(75-60)。因此,基本上,给定一个特定的选项ID(在我上面的查询中,它是选项122),我们希望得到一个所有其他选项ID的列表,按偏差排序,这是衡量两个选项之间关联程度的一个指标。我能够使用您的查询,只需很少修改,为了精确地产生我需要的结果,执行时间从大约一分钟减少到大约一秒钟。唯一真正重要的变化是:“v.target_投票/t.target_投票作为分组百分比,v.all_投票/t.all_投票作为总百分比”
(SELECT p_id as poll_id, o_id AS option_id, description, optCount AS option_count, subgroup_percent, total_percent, ABS(total_percent - subgroup_percent) AS deviation
FROM(
   SELECT poll_id AS p_id, 
       option_id AS o_id, 
       (SELECT description FROM `option` WHERE id = o_id) AS description,
       COUNT(*) AS optCount, 
       (SELECT COUNT(*) FROM response INNER JOIN user_ids_122 ON response.user_id = user_ids_122.user_id WHERE option_id = o_id ) / 
       (SELECT COUNT(*) FROM response INNER JOIN user_ids_122 ON response.user_id = user_ids_122.user_id WHERE poll_id = p_id) AS subgroup_percent,
       (SELECT COUNT(*) FROM response WHERE option_id = o_id) / 
       (SELECT COUNT(*) FROM response WHERE poll_id = p_id) AS total_percent
   FROM response 
   INNER JOIN user_ids_122 ON response.user_id = user_ids_122.user_id 
   WHERE poll_id < '61'
   GROUP BY option_id DESC
   ) AS derived_table_122
)
ORDER BY deviation DESC, option_count DESC
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    121     Using filesort
2   DERIVED     user_ids_122    ALL     NULL    NULL    NULL    NULL    74  Using temporary; Using filesort
2   DERIVED     response    ref     poll_id,user_id     user_id     4   correlated.user_ids_122.user_id     780     Using where
7   DEPENDENT SUBQUERY  response    ref     poll_id     poll_id     4   func    7800    Using index
6   DEPENDENT SUBQUERY  response    ref     option_id   option_id   4   func    7800    Using index
5   DEPENDENT SUBQUERY  user_ids_122    ALL     NULL    NULL    NULL    NULL    74   
5   DEPENDENT SUBQUERY  response    ref     poll_id,user_id     poll_id     4   func    7800    Using where
4   DEPENDENT SUBQUERY  user_ids_122    ALL     NULL    NULL    NULL    NULL    74   
4   DEPENDENT SUBQUERY  response    ref     user_id,option_id   user_id     4   correlated.user_ids_122.user_id     780     Using where
3   DEPENDENT SUBQUERY  option  eq_ref  PRIMARY     PRIMARY     4   func    1 
id (INT)   poll_id (INT)   user_id (INT)   option_id (INT)   created (DATETIME)
7          7               1               14                2011-03-17 09:25:10
id (INT)   poll_id (INT)   text (TEXT)     description (TEXT)
14         7               No              people who dislike country music 
id (INT)   email (TEXT)         created (DATETIME)
1          user@example.com     2011-02-15 11:16:03
-- Compute the average you're looking for.
select ..., agg1, agg2, avg(...)
from (
     -- Use max() to merge the retrieved aggregates as individual rows.
     -- (This will be faster than joins if you're dealing with tons of rows.)
     select ..., max(agg1) as agg1, max(agg2) as agg2, ...
     from (
          -- Compute individual aggregates without nested loops.
          select ..., count(*) as agg1, null as agg2, ...
          from ...
          where ...
          group by ...
          union all
          select ..., null as agg1, count(*) as agg2, ...
          from ...
          where ...
          group by ...
          union all
          ...
          ) as aggs
     group by ...
     ) as rows
group by ...
(SELECT COUNT(*) FROM response WHERE option_id = o_id) / 
(SELECT COUNT(*) FROM response WHERE poll_id = p_id) as total_percent
SELECT [fields you need],
       MAX(total_reponses_by_option_id) / MAX(total_reponses_by_option_id) as total_percent
FROM (
    SELECT [fields you need],
           COUNT(*) as total_reponses_by_option_id,
           NULL as total_reponses_by_poll_id
    FROM response
    [join/where as needed]
    GROUP BY [fields you need]
    UNION ALL
    SELECT [fields you need],
           NULL as total_reponses_by_option_id,
           COUNT(*) as total_reponses_by_poll_id
    FROM response
    [join/where as needed]
    GROUP BY [fields you need]
    ) as agg
GROUP BY [fields you need];
(Result from PreQuery 1 on just the poll counts)
Poll   Count
  1    50
  2    30

(Result from PreQuery 2 on poll AND Option)
Poll  Option Count
  1     1      30
  1     2      12
  1     3       5
  1     4       3
  2     5       8
  2     6      12
  2     7      10

Final join should have
Poll  Option  Description  PerPollAndOption  SubGroup_Percent PerPollResponse
  1     1     Descrip 1         30             .60               50
  1     2     Descrip 2         12             .24               50
  1     3     Descrip 3          5             .10               50
  1     4     Descrip 4          3             .06               50
  2     5     Descrip 5          8             .27               30
  2     6     Descrip 6         12             .40               30
  2     7     Descrip 7         10             .33               30
SELECT 
      ByPoll.Poll_ID,
      ByPollOption.Option_ID,
      Option.Description,
      ByPollOption.PerPollAndOption,
      ByPollOption.PerPollAndOption / ByPoll.PerPollResponse as SubGroup_Percent,
      ByPoll.PerPollResponse
FROM
   (  select
            Poll_ID, 
            COUNT(*) as PerPollResponse
         from 
            Response
         where
            Poll_ID < '61'
         group by 
            Poll_ID ) ByPoll
   JOIN (   select r.Poll_ID,
                   r.Option_ID,
                   COUNT(*) as PerPollAndOption
                from
                   Responses r
                      join option o
                         ON r.Option_ID = o.id 
                where
                   Poll_ID < '61'
                group by
                   r.Poll_ID,
                   r.Option_ID ) ByPollOption
      ON ByPoll.Poll_ID = ByPollOption.Poll_ID
   JOIN OPTION
      ON ByPollOption.Option_ID = Option.ID
WITH 
-- users of interest : target group
uids AS (
    SELECT DISTINCT user_id 
        FROM    options 
        JOIN    responses USING (option_id)
        WHERE   poll_id=22
    ),
-- votes of everyone and target group
votes AS (
    SELECT poll_id, option_id, sum(all_votes) AS all_votes, sum(target_votes) AS target_votes
        FROM (
            SELECT option_id, count(*) AS all_votes, count(uids.user_id) AS target_votes
                FROM        responses 
                LEFT JOIN   uids USING (user_id)
                GROUP BY option_id
        ) v
        JOIN    options     USING (option_id)
        GROUP BY poll_id, option_id
    ),
-- totals for all polls (reuse previous result)
totals AS (
    SELECT poll_id, sum(all_votes) AS all_votes, sum(target_votes) AS target_votes
        FROM votes
        GROUP BY poll_id
    ),
poll_options AS (
    SELECT poll_id, count(*) AS poll_option_count
        FROM options 
        GROUP BY poll_id
    )
-- reuse previous tables to get some stats
SELECT  *, ABS(total_percent - subgroup_percent) AS deviation
    FROM (
        SELECT
            poll_id,
            option_id,
            v.target_votes / v.all_votes AS subgroup_percent,
            t.target_votes / t.all_votes AS total_percent,
            poll_option_count
        FROM votes  v
        JOIN totals t           USING (poll_id)
        JOIN poll_options po    USING (poll_id)
    ) AS foo
    ORDER BY deviation DESC, poll_option_count DESC;

                                                                                  QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14910.46..14910.56 rows=40 width=144) (actual time=299.844..299.862 rows=200 loops=1)
   Sort Key: (abs(((t.target_votes / t.all_votes) - (v.target_votes / v.all_votes)))), po.poll_option_count
   Sort Method:  quicksort  Memory: 52kB
   CTE uids
     ->  HashAggregate  (cost=1801.43..1850.52 rows=4909 width=4) (actual time=3.935..4.793 rows=4860 loops=1)
           ->  Nested Loop  (cost=0.00..1789.16 rows=4909 width=4) (actual time=0.029..2.555 rows=4860 loops=1)
                 ->  Seq Scan on options  (cost=0.00..3.50 rows=5 width=4) (actual time=0.008..0.032 rows=5 loops=1)
                       Filter: (poll_id = 22)
                 ->  Index Scan using responses_option_id_key on responses  (cost=0.00..344.86 rows=982 width=8) (actual time=0.012..0.298 rows=972 loops=5)
                       Index Cond: (public.responses.option_id = public.options.option_id)
   CTE votes
     ->  HashAggregate  (cost=13029.43..13032.43 rows=200 width=24) (actual time=298.255..298.317 rows=200 loops=1)
           ->  Hash Join  (cost=13019.68..13027.43 rows=200 width=24) (actual time=297.953..298.103 rows=200 loops=1)
                 Hash Cond: (public.responses.option_id = public.options.option_id)
                 ->  HashAggregate  (cost=13014.18..13017.18 rows=200 width=8) (actual time=297.839..297.879 rows=200 loops=1)
                       ->  Merge Left Join  (cost=399.13..11541.43 rows=196366 width=8) (actual time=9.301..230.467 rows=196366 loops=1)
                             Merge Cond: (public.responses.user_id = uids.user_id)
                             ->  Index Scan using responses_pkey on responses  (cost=0.00..8585.75 rows=196366 width=8) (actual time=0.015..121.971 rows=196366 loops=1)
                             ->  Sort  (cost=399.13..411.40 rows=4909 width=4) (actual time=9.281..22.044 rows=137645 loops=1)
                                   Sort Key: uids.user_id
                                   Sort Method:  quicksort  Memory: 420kB
                                   ->  CTE Scan on uids  (cost=0.00..98.18 rows=4909 width=4) (actual time=3.937..6.549 rows=4860 loops=1)
                 ->  Hash  (cost=3.00..3.00 rows=200 width=8) (actual time=0.095..0.095 rows=200 loops=1)
                       ->  Seq Scan on options  (cost=0.00..3.00 rows=200 width=8) (actual time=0.007..0.043 rows=200 loops=1)
   CTE totals
     ->  HashAggregate  (cost=5.50..8.50 rows=200 width=68) (actual time=298.629..298.640 rows=40 loops=1)
           ->  CTE Scan on votes  (cost=0.00..4.00 rows=200 width=68) (actual time=298.257..298.425 rows=200 loops=1)
   CTE poll_options
     ->  HashAggregate  (cost=4.00..4.50 rows=40 width=4) (actual time=0.091..0.101 rows=40 loops=1)
           ->  Seq Scan on options  (cost=0.00..3.00 rows=200 width=4) (actual time=0.005..0.020 rows=200 loops=1)
   ->  Hash Join  (cost=6.95..13.45 rows=40 width=144) (actual time=298.994..299.554 rows=200 loops=1)
         Hash Cond: (t.poll_id = v.poll_id)
         ->  CTE Scan on totals t  (cost=0.00..4.00 rows=200 width=68) (actual time=298.632..298.669 rows=40 loops=1)
         ->  Hash  (cost=6.45..6.45 rows=40 width=84) (actual time=0.335..0.335 rows=200 loops=1)
               ->  Hash Join  (cost=1.30..6.45 rows=40 width=84) (actual time=0.140..0.263 rows=200 loops=1)
                     Hash Cond: (v.poll_id = po.poll_id)
                     ->  CTE Scan on votes v  (cost=0.00..4.00 rows=200 width=72) (actual time=0.001..0.030 rows=200 loops=1)
                     ->  Hash  (cost=0.80..0.80 rows=40 width=12) (actual time=0.130..0.130 rows=40 loops=1)
                           ->  CTE Scan on poll_options po  (cost=0.00..0.80 rows=40 width=12) (actual time=0.093..0.119 rows=40 loops=1)
 Total runtime: 300.132 ms