配置单元SQL:筛选出包含特定列重复值的行

配置单元SQL:筛选出包含特定列重复值的行,sql,filter,hiveql,Sql,Filter,Hiveql,我有一个包含以下数据的配置单元表:第一行是标题 session,ts,status,color a,1,new,red a,2,check,blue a,3,new,green a,4,amount,blue a,5,end,blue b,1,new,red b,2,bottle,blue b,3,end,blue c,4,check,blue 我在编写满足以下条件的sql查询时遇到问题: 1包含新状态的会话的所有行。 2如果会话包含status=new的多LPE值,则仅删除第一个值 输出将是

我有一个包含以下数据的配置单元表:第一行是标题

session,ts,status,color
a,1,new,red
a,2,check,blue
a,3,new,green
a,4,amount,blue
a,5,end,blue
b,1,new,red
b,2,bottle,blue
b,3,end,blue
c,4,check,blue
我在编写满足以下条件的sql查询时遇到问题: 1包含新状态的会话的所有行。 2如果会话包含status=new的多LPE值,则仅删除第一个值

输出将是

a,1,new,red
a,2,check,blue
a,4,amount,blue
a,5,end,blue
b,1,new,red
b,2,bottle,blue
b,3,end,blue
省略行a、3、new、green和c、4、check和blue

我已经编写了这个查询,如果您只查看session、ts和status列,它确实起到了作用,但是我不喜欢第二个查询,因为它有一个groupby

select  session, ts, status from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a 
where a.status = 'new'
) 
union
select session, min(ts), status from mp_logon3
where status='new'
and session in (select distinct b.session from mp_logon3 b
where b.status = 'new'
)
group by session, status 
但是,一旦添加颜色列,它就会崩溃。您将获得session=a和status=new的两行。一个是绿色,一个是红色

select  session, ts, status, flavor from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a 
where a.status = 'new'
) 
union
select session, min(ts), status, flavor from mp_logon3
where status='new'
and session in (select distinct b.session from mp_logon3 b
where b.status = 'new'
)
group by session, status, flavor
最后,有没有更好的方法来编写整个查询。如果使用Teradata SQL,可能是没有联合的?

select  session, ts, status, color
from mp_logon3
where status='new'
and session in (select distinct a.session from mp_logon3 a 
where a.status = 'new'
) 
qualify row_number() over (partition by session,status order by ts)=1
union
select  session, ts, status, flavor from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a 
where a.status = 'new'
) 
如果使用Teradata SQL:

select  session, ts, status, color
from mp_logon3
where status='new'
and session in (select distinct a.session from mp_logon3 a 
where a.status = 'new'
) 
qualify row_number() over (partition by session,status order by ts)=1
union
select  session, ts, status, flavor from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a 
where a.status = 'new'
) 

以下是针对您的问题的HiveQL解决方案

WITH sessions
AS (SELECT DISTINCT session
    FROM mp_logon3
    WHERE STATUS = 'new')
,logons
AS (SELECT session
        ,ts
        ,STATUS
        ,color
        ,row_number() OVER (
            PARTITION BY session
            ,STATUS ORDER BY ts
            ) AS r_num
    FROM mp_logon3)
SELECT l.*
FROM logons l
INNER JOIN sessions s ON (s.session = l.session)
WHERE l.STATUS <> 'new'
    OR l.r_num = 1
ORDER BY l.session
    ,l.ts;   

以下是针对您的问题的HiveQL解决方案

WITH sessions
AS (SELECT DISTINCT session
    FROM mp_logon3
    WHERE STATUS = 'new')
,logons
AS (SELECT session
        ,ts
        ,STATUS
        ,color
        ,row_number() OVER (
            PARTITION BY session
            ,STATUS ORDER BY ts
            ) AS r_num
    FROM mp_logon3)
SELECT l.*
FROM logons l
INNER JOIN sessions s ON (s.session = l.session)
WHERE l.STATUS <> 'new'
    OR l.r_num = 1
ORDER BY l.session
    ,l.ts;   

哪个数据库管理系统?Oracle、SQL Server、MySQL等,谢谢您的评论。这是HiveSQL。我现在已经添加了。哪个数据库管理系统?Oracle、SQL Server、MySQL等,谢谢您的评论。这是HiveSQL。我现在已经添加了它。谢谢,但它实际上是HiveSQL。我现在已经指定了。谢谢,但它实际上是HiveSQL。我现在已经说明了这一点。