基于出现次数的sql case语句
我有一个带有电子邮件地址、用户和域名的表t1:基于出现次数的sql case语句,sql,sql-server,join,case,Sql,Sql Server,Join,Case,我有一个带有电子邮件地址、用户和域名的表t1: email user domain joe123@domain.com joe123 domain.com sue234@email.net sue234 email.net ... ... ... 另一个表t2显示发送到某个地址的电子邮件是否已打开: Opened
email user domain
joe123@domain.com joe123 domain.com
sue234@email.net sue234 email.net
... ... ...
另一个表t2显示发送到某个地址的电子邮件是否已打开:
Opened Email
0 joe123@domain.com
1 sue234@email.net
0 jack55@mybarber.com
... ...
我想将t1.domain加入t2,但只加入发生次数超过100倍的域
我可以创建一个具有事件计数的表
SELECT domain, count(domain) cntDomain
from table1
group by domain
其结果如下:
domain cntDomain
domain.com 5000
email.net 4300
mybarber.com 67
Opened Email domain
0 joe123@domain.com domain.com
1 sue234@email.net email.net
0 jack55@mybarber.com other
... ...
生成的表如下所示:
domain cntDomain
domain.com 5000
email.net 4300
mybarber.com 67
Opened Email domain
0 joe123@domain.com domain.com
1 sue234@email.net email.net
0 jack55@mybarber.com other
... ...
但无法计算连接我假设它将是一个左连接,为不经常出现的值创建“其他”值,如果出现超过100倍,则需要使用case语句连接该值,如果不是“其他”值。不清楚第一个表中的所有电子邮件是否都在第二个表中。如果是,您可以执行以下操作:
select *
from table2 t2
inner join
(
SELECT domain, count(1) cntDomain
from table1
group by domain
having count(1) > 100
) t1 on t2.email = t1.email
select t1.*, t2.domain
from (select t2.*, count(*) over (partition by domain) as cnt
from table2 t2
) t2 join
table1 t1
on t1.email = t2.email
where cnt > 100;
如果没有,我们可以在电子邮件地址本身中检查域:
select t2.*, t1.domain
from table2 t2 left join
(select t1.domain, count(*) as cnt
from table1 t1
group by t1.domain
) t1
on t2.email like '%@' + t1.domain and
cnt > 100;
预计此版本的性能会非常非常差。此方法使用内部查询获取计数,然后使用case语句将计数解释为域或字符串“Other”(视情况而定)。在一些播放数据上测试了它,以确保它工作正常,但我对它的性能没有意见 感觉有点尴尬,因为t1被查询了两次;一次获取域,一次获取计数。不管怎样,它完成了任务 如果特定阈值发生变化,您可以将数字100换成另一个数字或变量
select
t2.Opened
, t2.Email
, case when t3.cntDomain > 100 then t3.domain else 'Other' end as domain
from t2
left outer join t1 on t2.Email = t1.email
left outer join (
select t1.domain, count(1) cntDomain
from t1
left outer join t2 on t1.email = t2.email
group by t1.domain
) as t3 on t1.domain = t3.domain
编辑
如果您不喜欢case语句,这种方法可能会更优雅。使用having语句修改内部查询。现在,由于左连接,如果计数小于阈值,t3.domain将为null。在select语句中添加一点ISNULL以进行null合并,这样就节省了金钱
select
t2.Opened
, t2.Email
, ISNULL(t3.domain, 'Other')
from t2
left outer join t1 on t2.Email = t1.email
left outer join (
select t1.domain, count(1) cntDomain
from t1
left outer join t2 on t1.email = t2.email
group by t1.domain
having count(1) > 100
) as t3 on t1.domain = t3.domain
干杯 我认为下面的查询应该可以解决您的问题
SELECT t2.opened,
t2.Email,
CASE WHEN tempt1.email is NULL THEN 'Other' ELSE tempt1.domain END as domain
FROM t2 LEFT JOIN (SELECT email,domain
FROM t1
group by domain HAVING count(domain)>100) tempt1 on t2.Email=tempt1.email
您需要一个计数*>100的值。您可能希望使第二个查询的第一个联接条件t2.email类似“@”+t1.domain以保持子域之间的分隔。@Allan。这很有道理。非常感谢。它也会影响像gmail.com和mail.com这样的东西。