每次运行使用SQL Impala中的lead函数的我都会得到不同的结果

每次运行使用SQL Impala中的lead函数的我都会得到不同的结果,sql,impala,hue,Sql,Impala,Hue,我有以下代码: select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type from table_name; create table t1 select *, lead(session_end_type) over (partition by user_id, session_id

我有以下代码:

select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

select count (*) from
(
    select * from t1
    union
    select * from t2
) as t;
但是,每次运行它时,似乎都会产生不同的结果

这有什么区别

提前谢谢

(我检查了代码是否通过以下代码输出不同的结果:

select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;
create table t1
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

create table t2
select *, lead(session_end_type) over (partition by user_id, session_id order by user_id, session_id, log_time) as next_session_end_type
from table_name;

select count (*) from
(
    select * from t1
    union
    select * from t2
) as t;

产生的行计数不同于t1的行计数和t2的行计数;这意味着t1和t2的结果不同。)

首先,不需要重复
顺序中的
列划分。您可以将其简化为:

lead(session_end_type) over (partition by user_id, session_id order by log_time) as next_session_end_type
其次,如果
log\u time
对于给定的
用户id
/
会话id
不是唯一的,则结果是不稳定的。请记住,SQL表表示无序集,因此如果排序键中存在关联,那么就没有“自然”顺序可依赖

您可以检查以下内容:

select user_id, session_id, log_time, count(*)
from table_name
group by user_id, session_id, log_time
having count(*) > 1
order by count(*) desc;
如果您确实有一列唯一标识每一行(或每一个用户/用户会话行),则将该列按的顺序包括在内:

lead(session_end_type) over (partition by user_id, session_id
                             order by log_time, <make it stable column>) as next_session_end_type
                            )
lead(会话结束类型)over(按用户id、会话id划分)
按日志\u时间排序,作为下一个\u会话\u结束\u类型
)

用户id、会话id、日志时间的组合在表中是否唯一?否则,您可能会得到略有不同的订单,从而导致不同的潜在客户价值,从而导致不同的计数。@ThorstenKettner谢谢!我以为它们是独一无二的,但事实证明它们不是!没有想到日志错误!顺便说一句,不需要按用户id、会话id排序,这些列已经按@dnoeth-Oh划分在
分区中,谢谢你让我知道:)