每次运行使用SQL Impala中的lead函数的我都会得到不同的结果
我有以下代码:每次运行使用SQL Impala中的lead函数的我都会得到不同的结果,sql,impala,hue,Sql,Impala,Hue,我有以下代码: select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type from table_name; create table t1 select *, lead(session_end_type) over (partition by user_id, session_id
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
select count (*) from
(
select * from t1
union
select * from t2
) as t;
但是,每次运行它时,似乎都会产生不同的结果
这有什么区别
提前谢谢
(我检查了代码是否通过以下代码输出不同的结果:
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
select count (*) from
(
select * from t1
union
select * from t2
) as t;
产生的行计数不同于t1的行计数和t2的行计数;这意味着t1和t2的结果不同。)首先,不需要重复
按
顺序中的按
列划分。您可以将其简化为:
lead(session_end_type) over (partition by user_id, session_id order by log_time) as next_session_end_type
其次,如果log\u time
对于给定的用户id
/会话id
不是唯一的,则结果是不稳定的。请记住,SQL表表示无序集,因此如果排序键中存在关联,那么就没有“自然”顺序可依赖
您可以检查以下内容:
select user_id, session_id, log_time, count(*)
from table_name
group by user_id, session_id, log_time
having count(*) > 1
order by count(*) desc;
如果您确实有一列唯一标识每一行(或每一个用户/用户会话行),则将该列按的顺序包括在内:
lead(session_end_type) over (partition by user_id, session_id
order by log_time, <make it stable column>) as next_session_end_type
)
lead(会话结束类型)over(按用户id、会话id划分)
按日志\u时间排序,作为下一个\u会话\u结束\u类型
)
用户id、会话id、日志时间的组合在表中是否唯一?否则,您可能会得到略有不同的订单,从而导致不同的潜在客户价值,从而导致不同的计数。@ThorstenKettner谢谢!我以为它们是独一无二的,但事实证明它们不是!没有想到日志错误!顺便说一句,不需要按用户id、会话id排序,这些列已经按@dnoeth-Oh划分在分区中,谢谢你让我知道:)