Snowflake cloud data platform 需要帮助了解为什么多个左联接不是';在雪花中,我不会回来

Snowflake cloud data platform 需要帮助了解为什么多个左联接不是';在雪花中,我不会回来,snowflake-cloud-data-platform,Snowflake Cloud Data Platform,有一些问题与多个左连接没有做我期望他们 select sent.id, sent.ts, sent.email, delivered.ts, type.label, min(opens.ts) as first_open, count(opens.id) as open_count, min(clicks.ts) as first_click, count(clicks.id) as click_count from se

有一些问题与多个左连接没有做我期望他们

select 
    sent.id,
    sent.ts,
    sent.email,
    delivered.ts,
    type.label,
    min(opens.ts) as first_open,
    count(opens.id) as open_count,
    min(clicks.ts) as first_click,
    count(clicks.id) as click_count
from sent
inner join type on type.id = sent.type_id
left outer join delivered on (delivered.id = sent.id)
left outer join opens on (opens.id = sent.id)
left outer join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by 
    sent.id,
    sent.ts,
    sent.email,
    delivered.ts,
    type.label,
    opens.id,
    clicks.id
;
一条消息被发送,然后被传递;这是1比1,但是,交付可能不存在

然后可以打开(多次)和单击(多次)消息,所有这些都与sent.id绑定在一起

如果我只是打开连接,它工作得很好,但是,如果我只是点击连接

当我添加点击时,首先点击
并点击
并点击计数
显示与打开相同的值

我得到:

12020-01-01 00:00:00,a@b.com,2020-01-01 00:00:00,测试,2020-01-01:00:00,42020-01-01-01:00:00,4

何时应该:

12020-01-01 00:00:00,a@b.com,2020-01-01 00:00:00,测试,2020-01-01 01:00:00,42020-01-01-01 02:00:00,1


我尝试过在没有查询缓存的情况下运行(
ALTER SESSION SET USE\u CACHED\u RESULT=false;
),并在MySQL中做了一个基本镜像,以证明连接是正确的。

因此,我试图弥合问题描述和您提到的结果之间的差距

从已知数据开始

create or replace table sent (id text, ts timestamp_ntz, email text, type_id number);
create or replace table type (id number, label text);
create or replace table delivered(id text, ts timestamp_ntz);
create or replace table opens(id text, ts timestamp_ntz);
create or replace table clicks(id text, ts timestamp_ntz);

insert into sent values ('a1b1c1d1e1', '2020-01-01 01:00', 'a@b.com', 1);
insert into delivered values ('a1b1c1d1e1', '2020-01-01 02:00');
insert into type values (1, 'test');
insert into opens values ('a1b1c1d1e1', '2020-01-01 03:00'),('a1b1c1d1e1', '2020-01-01 04:00'),('a1b1c1d1e1', '2020-01-01 05:00'),('a1b1c1d1e1', '2020-01-01 06:00');
insert into clicks values ('a1b1c1d1e1', '2020-01-01 07:00');

select 
    sent.id
    ,sent.ts
    ,sent.email
    ,delivered.ts
    ,type.label
    ,min(opens.ts) as first_open
    ,count(opens.id) as open_count
    ,min(clicks.ts) as first_click
    ,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by 1,2,3,4, 5;
我将列名交换到它们的位置,因为我喜欢这种方式,但您不需要
打开.id
单击.id
,因为这些列在非聚合列中未被选中

 ID TS  EMAIL   TS  LABEL   FIRST_OPEN  OPEN_COUNT  FIRST_CLICK CLICK_COUNT
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 03:00:00.000 4   2020-01-01 07:00:00.000 4
我不确定你正在改变什么样的行为。。但是打印所有的行,看看发生了什么,了解为什么你没有得到你想要的,这可能会有帮助

select 
    sent.id
    ,sent.ts
    ,sent.email
    ,delivered.ts
    ,type.label
    ,opens.ts as open_ts
    ,clicks.ts as click_ts
    --,min(opens.ts) as first_open
    --,count(opens.id) as open_count
    --,min(clicks.ts) as first_click
    --,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
--group by 1,2,3,4, 5;
给我:

 ID TS  EMAIL   TS  LABEL   OPEN_TS CLICK_TS
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 03:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 04:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 05:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 06:00:00.000 2020-01-01 07:00:00.000
这就是我对乙醚左键或正常内键的期望。。
请随意使用SQL进行更新,它会为您提供不完整的结果,以及上面列出的输出版本,以获得更好的解释。

因此,请尝试弥合问题描述和您提到的结果之间的差距

从已知数据开始

create or replace table sent (id text, ts timestamp_ntz, email text, type_id number);
create or replace table type (id number, label text);
create or replace table delivered(id text, ts timestamp_ntz);
create or replace table opens(id text, ts timestamp_ntz);
create or replace table clicks(id text, ts timestamp_ntz);

insert into sent values ('a1b1c1d1e1', '2020-01-01 01:00', 'a@b.com', 1);
insert into delivered values ('a1b1c1d1e1', '2020-01-01 02:00');
insert into type values (1, 'test');
insert into opens values ('a1b1c1d1e1', '2020-01-01 03:00'),('a1b1c1d1e1', '2020-01-01 04:00'),('a1b1c1d1e1', '2020-01-01 05:00'),('a1b1c1d1e1', '2020-01-01 06:00');
insert into clicks values ('a1b1c1d1e1', '2020-01-01 07:00');

select 
    sent.id
    ,sent.ts
    ,sent.email
    ,delivered.ts
    ,type.label
    ,min(opens.ts) as first_open
    ,count(opens.id) as open_count
    ,min(clicks.ts) as first_click
    ,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by 1,2,3,4, 5;
我将列名交换到它们的位置,因为我喜欢这种方式,但您不需要
打开.id
单击.id
,因为这些列在非聚合列中未被选中

 ID TS  EMAIL   TS  LABEL   FIRST_OPEN  OPEN_COUNT  FIRST_CLICK CLICK_COUNT
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 03:00:00.000 4   2020-01-01 07:00:00.000 4
我不确定你正在改变什么样的行为。。但是打印所有的行,看看发生了什么,了解为什么你没有得到你想要的,这可能会有帮助

select 
    sent.id
    ,sent.ts
    ,sent.email
    ,delivered.ts
    ,type.label
    ,opens.ts as open_ts
    ,clicks.ts as click_ts
    --,min(opens.ts) as first_open
    --,count(opens.id) as open_count
    --,min(clicks.ts) as first_click
    --,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
--group by 1,2,3,4, 5;
给我:

 ID TS  EMAIL   TS  LABEL   OPEN_TS CLICK_TS
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 03:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 04:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 05:00:00.000 2020-01-01 07:00:00.000
 a1b1c1d1e1 2020-01-01 01:00:00.000 a@b.com 2020-01-01 02:00:00.000 test    2020-01-01 06:00:00.000 2020-01-01 07:00:00.000
这就是我对乙醚左键或正常内键的期望。。
请随时使用SQL更新,以获得更好的解释。该SQL将为您提供不完整的结果,以及上面列出的输出版本。

如果没有示例数据来重现问题,则无法回答。关于您的代码,我想到的唯一一件事是,您是否真的想按opens.id和clicks.id进行分组?这对聚合没有意义,因为有问题。您的SQL select将
sent.id
作为第一列,但是您有一个where子句,它是
where sent.id='a1b1c1d1e1'
,并且您的示例显示
1
在没有示例数据重现问题时无法回答。关于您的代码,我想到的唯一一件事是,您是否真的想按opens.id和clicks.id进行分组?这对聚合没有意义,因为有问题。您的SQL select将
sent.id
作为第一列,但您有一个where子句,它是
where sent.id='a1b1c1d1e1'
,示例显示
1