Sql 在使用CTE时理解解释-尝试获取要计算的查询

Sql 在使用CTE时理解解释-尝试获取要计算的查询,sql,postgresql,Sql,Postgresql,我一直在努力处理一个查询,并尝试各种变化以达到我想要的结果。但我失败了。我希望,如果我与explain语句输出共享我尝试过的变体,那么任何人都可能有一个指针 博士后11.6 对于下面的代码块,dimension1是存在于我所引用的所有表上的字段。日期只出现在sessions表中,所以为了获取特定日期的数据,我创建了一个cte filter_sessions,只获取在给定日期出现的维度1,然后连接到我的其他表。这允许我的查询选择特定日期的数据,在本例中为2月6日 这是我最初的尝试。它使用了一个CT

我一直在努力处理一个查询,并尝试各种变化以达到我想要的结果。但我失败了。我希望,如果我与explain语句输出共享我尝试过的变体,那么任何人都可能有一个指针

博士后11.6

对于下面的代码块,dimension1是存在于我所引用的所有表上的字段。日期只出现在sessions表中,所以为了获取特定日期的数据,我创建了一个cte filter_sessions,只获取在给定日期出现的维度1,然后连接到我的其他表。这允许我的查询选择特定日期的数据,在本例中为2月6日

这是我最初的尝试。它使用了一个CTE,我更喜欢它的可读性,如果它只是运行,我可以编写更少的代码,但它没有:

with 

filter_sessions as (
select 
    dimension1,
    dimension2,
    date,
    channel_grouping,
    device_category,
    user_type
from ga_flagship_ecom.sessions
where date >= '2020-02-06'
and date <= '2020-02-06'
),

ee as (
select 
    e.dimension1,
    e.dimension3,
    case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level

    -- approximation for inferring if the product i a download and hence sees all the checkout steps
    case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
from ga_flagship_ecom.ecom e
join filter_sessions f on f.dimension1 = e.dimension1
group by 1,2
),

ecom_events as (
select 
    ev.dimension1,
    ev.dimension3,
    ev.event_action,
    ev.event_label,
    ee.zero_val_product,
    ee.download
from ga_flagship_ecom.events ev 
join ee on ee.dimension1 = ev.dimension1 and ee.dimension3 = ev.dimension3
where ev.event_category = 'ecom'
)

select 
    s.date,
    lower(s.channel_grouping) as channel_grouping,
    lower(s.device_category) as device_category,
    lower(s.user_type) as user_type,
    lower(ev.event_action) as event_action,
    lower(coalesce(ev.event_label, 'na')) as event_label,
    ev.zero_val_product,
    ev.download,
    count(distinct s.dimension1) as sessions,
    count(distinct s.dimension2) as daily_users
from filter_sessions s
join ecom_events ev on ev.dimension1 = s.dimension1
group by 1,2,3,4,5,6,7,8;
以下是基于使用where筛选器而不是内部联接的解释输出:

GroupAggregate  (cost=222818.84..222818.89 rows=1 width=188)
  Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
  CTE filter_sessions
    ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
          Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
  CTE ee
    ->  GroupAggregate  (cost=47604.63..47606.31 rows=48 width=38)
          Group Key: e.dimension1, e.dimension3
          ->  Sort  (cost=47604.63..47604.75 rows=48 width=51)
                Sort Key: e.dimension1, e.dimension3
                ->  Nested Loop  (cost=0.58..47603.29 rows=48 width=51)
                      ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32)
                            Group Key: (filter_sessions.dimension1)::text
                            ->  CTE Scan on filter_sessions  (cost=0.00..0.02 rows=1 width=32)
                      ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                            Index Cond: ((dimension1)::text = (filter_sessions.dimension1)::text)
  CTE ecom_events
    ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
          Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
          ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                Filter: ((event_category)::text = 'ecom'::text)
          ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
  ->  Sort  (cost=0.08..0.08 rows=1 width=236)
        Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
        ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
              Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
              ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
              ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)
这也失败了(我真的很乐观这会奏效)。以下是此尝试的解释输出:

GroupAggregate  (cost=222818.33..222818.38 rows=1 width=188)
  Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
  CTE filter_sessions
    ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
          Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
  CTE ee_base
    ->  Nested Loop  (cost=0.56..47603.39 rows=48 width=66)
          ->  CTE Scan on filter_sessions f  (cost=0.00..0.02 rows=1 width=32)
          ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                Index Cond: ((dimension1)::text = (f.dimension1)::text)
  CTE ee
    ->  HashAggregate  (cost=1.68..2.40 rows=48 width=48)
          Group Key: ee_base.dimension1, ee_base.dimension3
          ->  CTE Scan on ee_base  (cost=0.00..0.96 rows=48 width=76)
  CTE ecom_events
    ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
          Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
          ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                Filter: ((event_category)::text = 'ecom'::text)
          ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
  ->  Sort  (cost=0.08..0.08 rows=1 width=236)
        Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
        ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
              Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
              ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
              ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)
GroupAggregate(成本=222818.33..222818.38行=1宽=188)
组键:s.date,(lower((s.channel_分组)::text)),(lower((s.device_类别)::text)),(lower((s.user_类型)::text)),(lower((ev.event_动作)::text)),(lower((合并(ev.event_标签,'na':字符变化)),ev.zero_val_产品,ev.download
CTE过滤器会话
->在会话上使用会话\日期\ idx进行索引扫描(成本=0.56..2.78行=1宽度=76)
索引条件:((日期>='2020-02-06'::日期)和(日期嵌套循环(成本=0.56..47603.39行=48宽度=66)
->过滤器上的CTE扫描(成本=0.00..0.02行=1宽度=32)
->在ecom e上使用ecom_维度1_idx进行索引扫描(成本=0.56..47602.77行=48宽度=51)
索引条件:((维度1)::text=(f.dimension1)::text)
CTE ee
->HashAggregate(成本=1.68..2.40行=48宽度=48)
组键:ee_base.dimension1,ee_base.dimension3
->ee_基座上的CTE扫描(成本=0.00..0.96行=48宽度=76)
CTE ecom_活动
->散列联接(成本=1.68..175209.67行=1宽度=60)
散列条件:((ev_1.dimension1)::text=(ee.dimension1)::text)和(ev_1.dimension3=ee.dimension3))
->事件ev_1的顺序扫描(成本=0.00..150210.69行=3332973宽度=52)
筛选器:((事件类别)::text='ecom'::text)
->散列(成本=0.96..0.96行=48宽度=48)
->ee上的CTE扫描(成本=0.00..0.96行=48宽度=48)
->排序(成本=0.08..0.08行=1宽度=236)
排序键:s.date,(lower((s.channel_分组)::text)),(lower((s.device_类别)::text)),(lower((s.user_类型)::text)),(lower((ev.event_动作)::text)),(lower((合并(ev.event_标签,'na':字符变化)),ev.zero_val_产品,ev.download
->嵌套循环(成本=0.00..0.07行=1宽度=236)
联接筛选器:((s.dimension1)::text=(ev.dimension1)::text)
->过滤器上的CTE扫描(成本=0.00..0.02行=1宽度=164)
->ecom_事件ev上的CTE扫描(成本=0.00..0.02行=1宽度=104)
确实有效的方法是创建一个临时表。但我真的想找到一种方法解决这个问题,并按照优先顺序解决这个问题:

  • 仅使用CTE
  • 结合使用CTE和子查询
  • 最后,备份选项,只需为筛选器会话使用临时表

  • 这里还有什么我可以做的吗?

    您可以简单地将CTE重写到临时视图中,临时视图包含在主查询计划中


    将临时视图筛选器会话创建为
    选择
    尺寸1,
    尺寸2,
    zdate,
    信道分组,
    设备类别,
    用户类型
    来自ga_旗舰会议
    其中zdate>='2020-02-06'
    zdate 0,然后1,否则0作为零值积结束,-上卷到事件级别
    --用于推断产品是否已下载并因此看到所有签出步骤的近似值
    求和时的大小写(小写时的大小写(产品名称)~'digital | download | file'然后1 else 0 end)>0然后1 else 0作为下载结束
    来自ga_旗舰公司ecom.ecom e
    在f.dimension1=e.dimension1上加入筛选会话f
    按1,2分组
    ;
    创建临时视图ecom_事件作为
    选择
    ev.1,
    ev.3,
    ev.事件和行动,
    ev.event_标签,
    ee.zero_val_产品,
    下载
    来自ga_旗舰_ecom.events ev
    在ee.dimension1=ev.dimension1和ee.dimension3=ev.dimension3上加入ee
    其中ev.event_category='ecom'
    ;
    选择
    s、 zdate,
    较低(s.channel_分组)作为channel_分组,
    较低(s.设备类别)为设备类别,
    较低(s.user_类型)为user_类型,
    降低(电动事件动作)作为事件动作,
    下部(合并(ev.event_标签,'na'))作为事件标签,
    ev.zero_val_产品,
    下载,
    将(不同的s.1)计数为会话,
    将(不同的s.2)计数为每日用户
    从筛选器会话
    在ev上加入ecom_事件ev.dimension1=s.dimension1
    按1,2,3,4,5,6,7,8分组;
    
    您只需将CTE重写为临时视图,临时视图包含在主查询计划中


    将临时视图筛选器会话创建为
    选择
    尺寸1,
    尺寸2,
    zdate,
    信道分组,
    设备类别,
    用户类型
    来自ga_旗舰会议
    其中zdate>='2020-02-06'
    zdate 0,然后1,否则0作为零值积结束,-上卷到事件级别
    --用于推断产品是否已下载并因此看到所有签出步骤的近似值
    求和时的大小写(小写时的大小写(产品名称)~'digital | download | file'然后1 else 0 end)>0然后1 else 0作为下载结束
    来自ga_旗舰公司ecom.ecom e
    在f.dimension1=e.dimension1上加入筛选会话f
    按1,2分组
    ;
    创建临时视图ecom_事件作为
    选择
    ev.1,
    ev.3,
    ev.事件和行动,
    ev.event_标签,
    ee.zero_val_产品,
    下载
    来自ga_旗舰_ecom.events ev
    
    GroupAggregate  (cost=107619.19..107619.24 rows=1 width=188)
      Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
      CTE filter_sessions
        ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
              Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
      CTE ee
        ->  GroupAggregate  (cost=47606.05..47606.08 rows=1 width=38)
              Group Key: e.dimension1, e.dimension3
              ->  Sort  (cost=47606.05..47606.05 rows=1 width=51)
                    Sort Key: e.dimension1, e.dimension3
                    ->  Nested Loop  (cost=1.12..47606.04 rows=1 width=51)
                          ->  Index Only Scan using sessions_date_idx on sessions sessions_1  (cost=0.56..2.78 rows=1 width=22)
                                Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
                          ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                                Index Cond: ((dimension1)::text = (sessions_1.dimension1)::text)
      CTE ecom_events
        ->  Nested Loop  (cost=0.56..60010.25 rows=1 width=60)
              ->  CTE Scan on ee  (cost=0.00..0.02 rows=1 width=48)
              ->  Index Scan using events_pk on events ev_1  (cost=0.56..60010.22 rows=1 width=52)
                    Index Cond: (((dimension1)::text = (ee.dimension1)::text) AND (dimension3 = ee.dimension3))
                    Filter: ((event_category)::text = 'ecom'::text)
      ->  Sort  (cost=0.08..0.08 rows=1 width=236)
            Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
            ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
                  Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
                  ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
                  ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)
    
    ee as (
    select 
        e.dimension1,
        e.dimension3,
        case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level
    
        -- approximation for inferring if the product i a download and hence sees all the checkout steps
        case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
    from ga_flagship_ecom.ecom e
    --join filter_sessions f on f.dimension1 = e.dimension1
    where e.dimension1 in (select dimension1 from filter_sessions)
    group by 1,2
    ),
    
    GroupAggregate  (cost=222818.84..222818.89 rows=1 width=188)
      Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
      CTE filter_sessions
        ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
              Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
      CTE ee
        ->  GroupAggregate  (cost=47604.63..47606.31 rows=48 width=38)
              Group Key: e.dimension1, e.dimension3
              ->  Sort  (cost=47604.63..47604.75 rows=48 width=51)
                    Sort Key: e.dimension1, e.dimension3
                    ->  Nested Loop  (cost=0.58..47603.29 rows=48 width=51)
                          ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32)
                                Group Key: (filter_sessions.dimension1)::text
                                ->  CTE Scan on filter_sessions  (cost=0.00..0.02 rows=1 width=32)
                          ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                                Index Cond: ((dimension1)::text = (filter_sessions.dimension1)::text)
      CTE ecom_events
        ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
              Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
              ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                    Filter: ((event_category)::text = 'ecom'::text)
              ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                    ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
      ->  Sort  (cost=0.08..0.08 rows=1 width=236)
            Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
            ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
                  Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
                  ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
                  ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)
    
    ee_base as (
    select 
        e.dimension1,
        e.dimension3,
        e.metric1,
        lower(product_name) as product_name
    from ga_flagship_ecom.ecom e
    join filter_sessions f on f.dimension1 = e.dimension1
    ),
    
    
    ee as (
    select 
        dimension1,
        dimension3,
        case when sum(case when metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level
    
        -- approximation for inferring if the product i a download and hence sees all the checkout steps
        case when sum(case when product_name ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
    from ee_base
    group by 1,2
    ),
    
    GroupAggregate  (cost=222818.33..222818.38 rows=1 width=188)
      Group Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
      CTE filter_sessions
        ->  Index Scan using sessions_date_idx on sessions  (cost=0.56..2.78 rows=1 width=76)
              Index Cond: ((date >= '2020-02-06'::date) AND (date <= '2020-02-06'::date))
      CTE ee_base
        ->  Nested Loop  (cost=0.56..47603.39 rows=48 width=66)
              ->  CTE Scan on filter_sessions f  (cost=0.00..0.02 rows=1 width=32)
              ->  Index Scan using ecom_dimension1_idx on ecom e  (cost=0.56..47602.77 rows=48 width=51)
                    Index Cond: ((dimension1)::text = (f.dimension1)::text)
      CTE ee
        ->  HashAggregate  (cost=1.68..2.40 rows=48 width=48)
              Group Key: ee_base.dimension1, ee_base.dimension3
              ->  CTE Scan on ee_base  (cost=0.00..0.96 rows=48 width=76)
      CTE ecom_events
        ->  Hash Join  (cost=1.68..175209.67 rows=1 width=60)
              Hash Cond: (((ev_1.dimension1)::text = (ee.dimension1)::text) AND (ev_1.dimension3 = ee.dimension3))
              ->  Seq Scan on events ev_1  (cost=0.00..150210.69 rows=3332973 width=52)
                    Filter: ((event_category)::text = 'ecom'::text)
              ->  Hash  (cost=0.96..0.96 rows=48 width=48)
                    ->  CTE Scan on ee  (cost=0.00..0.96 rows=48 width=48)
      ->  Sort  (cost=0.08..0.08 rows=1 width=236)
            Sort Key: s.date, (lower((s.channel_grouping)::text)), (lower((s.device_category)::text)), (lower((s.user_type)::text)), (lower((ev.event_action)::text)), (lower((COALESCE(ev.event_label, 'na'::character varying))::text)), ev.zero_val_product, ev.download
            ->  Nested Loop  (cost=0.00..0.07 rows=1 width=236)
                  Join Filter: ((s.dimension1)::text = (ev.dimension1)::text)
                  ->  CTE Scan on filter_sessions s  (cost=0.00..0.02 rows=1 width=164)
                  ->  CTE Scan on ecom_events ev  (cost=0.00..0.02 rows=1 width=104)
    
    CREATE TEMP VIEW filter_sessions as
    select
        dimension1,
        dimension2,
        zdate,
        channel_grouping,
        device_category,
        user_type
    from ga_flagship_ecom.sessions
    where zdate >= '2020-02-06'
    and zdate <= '2020-02-06'
            ;
    
    CREATE TEMP VIEW ee as
    select
        e.dimension1,
        e.dimension3,
        case when sum(case when e.metric1 = 0 then 1 else 0 end) > 0 then 1 else 0 end as zero_val_product, -- roll up to event level
    
        -- approximation for inferring if the product i a download and hence sees all the checkout steps
        case when sum(case when lower(product_name) ~ 'digital|download|file' then 1 else 0 end) > 0 then 1 else 0 end as download
    from ga_flagship_ecom.ecom e
    join filter_sessions f on f.dimension1 = e.dimension1
    group by 1,2
            ;
    
    CREATE TEMP VIEW ecom_events as
    select
        ev.dimension1,
        ev.dimension3,
        ev.event_action,
        ev.event_label,
        ee.zero_val_product,
        ee.download
    from ga_flagship_ecom.events ev
    join ee on ee.dimension1 = ev.dimension1 and ee.dimension3 = ev.dimension3
    where ev.event_category = 'ecom'
            ;
    select
        s.zdate,
        lower(s.channel_grouping) as channel_grouping,
        lower(s.device_category) as device_category,
        lower(s.user_type) as user_type,
        lower(ev.event_action) as event_action,
        lower(coalesce(ev.event_label, 'na')) as event_label,
        ev.zero_val_product,
        ev.download,
        count(distinct s.dimension1) as sessions,
        count(distinct s.dimension2) as daily_users
    from filter_sessions s
    join ecom_events ev on ev.dimension1 = s.dimension1
    group by 1,2,3,4,5,6,7,8;