配置单元SQL:筛选出包含特定列重复值的行
我有一个包含以下数据的配置单元表:第一行是标题配置单元SQL:筛选出包含特定列重复值的行,sql,filter,hiveql,Sql,Filter,Hiveql,我有一个包含以下数据的配置单元表:第一行是标题 session,ts,status,color a,1,new,red a,2,check,blue a,3,new,green a,4,amount,blue a,5,end,blue b,1,new,red b,2,bottle,blue b,3,end,blue c,4,check,blue 我在编写满足以下条件的sql查询时遇到问题: 1包含新状态的会话的所有行。 2如果会话包含status=new的多LPE值,则仅删除第一个值 输出将是
session,ts,status,color
a,1,new,red
a,2,check,blue
a,3,new,green
a,4,amount,blue
a,5,end,blue
b,1,new,red
b,2,bottle,blue
b,3,end,blue
c,4,check,blue
我在编写满足以下条件的sql查询时遇到问题:
1包含新状态的会话的所有行。
2如果会话包含status=new的多LPE值,则仅删除第一个值
输出将是
a,1,new,red
a,2,check,blue
a,4,amount,blue
a,5,end,blue
b,1,new,red
b,2,bottle,blue
b,3,end,blue
省略行a、3、new、green和c、4、check和blue
我已经编写了这个查询,如果您只查看session、ts和status列,它确实起到了作用,但是我不喜欢第二个查询,因为它有一个groupby
select session, ts, status from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a
where a.status = 'new'
)
union
select session, min(ts), status from mp_logon3
where status='new'
and session in (select distinct b.session from mp_logon3 b
where b.status = 'new'
)
group by session, status
但是,一旦添加颜色列,它就会崩溃。您将获得session=a和status=new的两行。一个是绿色,一个是红色
select session, ts, status, flavor from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a
where a.status = 'new'
)
union
select session, min(ts), status, flavor from mp_logon3
where status='new'
and session in (select distinct b.session from mp_logon3 b
where b.status = 'new'
)
group by session, status, flavor
最后,有没有更好的方法来编写整个查询。如果使用Teradata SQL,可能是没有联合的?:
select session, ts, status, color
from mp_logon3
where status='new'
and session in (select distinct a.session from mp_logon3 a
where a.status = 'new'
)
qualify row_number() over (partition by session,status order by ts)=1
union
select session, ts, status, flavor from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a
where a.status = 'new'
)
如果使用Teradata SQL:
select session, ts, status, color
from mp_logon3
where status='new'
and session in (select distinct a.session from mp_logon3 a
where a.status = 'new'
)
qualify row_number() over (partition by session,status order by ts)=1
union
select session, ts, status, flavor from mp_logon3
where status!='new'
and session in (select distinct a.session from mp_logon3 a
where a.status = 'new'
)
以下是针对您的问题的HiveQL解决方案
WITH sessions
AS (SELECT DISTINCT session
FROM mp_logon3
WHERE STATUS = 'new')
,logons
AS (SELECT session
,ts
,STATUS
,color
,row_number() OVER (
PARTITION BY session
,STATUS ORDER BY ts
) AS r_num
FROM mp_logon3)
SELECT l.*
FROM logons l
INNER JOIN sessions s ON (s.session = l.session)
WHERE l.STATUS <> 'new'
OR l.r_num = 1
ORDER BY l.session
,l.ts;
以下是针对您的问题的HiveQL解决方案
WITH sessions
AS (SELECT DISTINCT session
FROM mp_logon3
WHERE STATUS = 'new')
,logons
AS (SELECT session
,ts
,STATUS
,color
,row_number() OVER (
PARTITION BY session
,STATUS ORDER BY ts
) AS r_num
FROM mp_logon3)
SELECT l.*
FROM logons l
INNER JOIN sessions s ON (s.session = l.session)
WHERE l.STATUS <> 'new'
OR l.r_num = 1
ORDER BY l.session
,l.ts;
哪个数据库管理系统?Oracle、SQL Server、MySQL等,谢谢您的评论。这是HiveSQL。我现在已经添加了。哪个数据库管理系统?Oracle、SQL Server、MySQL等,谢谢您的评论。这是HiveSQL。我现在已经添加了它。谢谢,但它实际上是HiveSQL。我现在已经指定了。谢谢,但它实际上是HiveSQL。我现在已经说明了这一点。