Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sql-server/26.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于出现次数的sql case语句_Sql_Sql Server_Join_Case - Fatal编程技术网

基于出现次数的sql case语句

基于出现次数的sql case语句,sql,sql-server,join,case,Sql,Sql Server,Join,Case,我有一个带有电子邮件地址、用户和域名的表t1: email user domain joe123@domain.com joe123 domain.com sue234@email.net sue234 email.net ... ... ... 另一个表t2显示发送到某个地址的电子邮件是否已打开: Opened

我有一个带有电子邮件地址、用户和域名的表t1:

     email                user         domain
joe123@domain.com        joe123        domain.com
sue234@email.net         sue234        email.net
      ...                  ...          ...
另一个表t2显示发送到某个地址的电子邮件是否已打开:

  Opened             Email
    0            joe123@domain.com
    1            sue234@email.net
    0            jack55@mybarber.com
   ...               ...
我想将t1.domain加入t2,但只加入发生次数超过100倍的域

我可以创建一个具有事件计数的表

SELECT domain, count(domain) cntDomain
from table1
group by domain
其结果如下:

   domain         cntDomain
 domain.com       5000
 email.net        4300
 mybarber.com     67
  Opened             Email                 domain
    0            joe123@domain.com         domain.com
    1            sue234@email.net          email.net
    0            jack55@mybarber.com       other 
   ...               ...
生成的表如下所示:

   domain         cntDomain
 domain.com       5000
 email.net        4300
 mybarber.com     67
  Opened             Email                 domain
    0            joe123@domain.com         domain.com
    1            sue234@email.net          email.net
    0            jack55@mybarber.com       other 
   ...               ...

但无法计算连接我假设它将是一个左连接,为不经常出现的值创建“其他”值,如果出现超过100倍,则需要使用case语句连接该值,如果不是“其他”值。

不清楚第一个表中的所有电子邮件是否都在第二个表中。如果是,您可以执行以下操作:

select *
from table2 t2
inner join
(
    SELECT domain, count(1) cntDomain
    from table1
    group by domain
    having count(1) > 100
) t1 on t2.email = t1.email
select t1.*, t2.domain
from (select t2.*, count(*) over (partition by domain) as cnt
      from table2 t2
     ) t2 join
     table1 t1
     on t1.email = t2.email
where cnt > 100;
如果没有,我们可以在电子邮件地址本身中检查域:

select t2.*, t1.domain
from table2 t2 left join
     (select t1.domain, count(*) as cnt
      from table1 t1
      group by t1.domain
     ) t1
     on t2.email like '%@' + t1.domain and
        cnt > 100;

预计此版本的性能会非常非常差。

此方法使用内部查询获取计数,然后使用case语句将计数解释为域或字符串“Other”(视情况而定)。在一些播放数据上测试了它,以确保它工作正常,但我对它的性能没有意见

感觉有点尴尬,因为t1被查询了两次;一次获取域,一次获取计数。不管怎样,它完成了任务

如果特定阈值发生变化,您可以将数字100换成另一个数字或变量

select 
  t2.Opened
, t2.Email
, case when t3.cntDomain > 100 then t3.domain else 'Other' end as domain
from t2
left outer join t1 on t2.Email = t1.email
left outer join (
    select t1.domain, count(1) cntDomain
    from t1
    left outer join t2 on t1.email = t2.email
    group by t1.domain
) as t3 on t1.domain = t3.domain
编辑

如果您不喜欢case语句,这种方法可能会更优雅。使用having语句修改内部查询。现在,由于左连接,如果计数小于阈值,t3.domain将为null。在select语句中添加一点ISNULL以进行null合并,这样就节省了金钱

select 
  t2.Opened
, t2.Email
, ISNULL(t3.domain, 'Other')
from t2
left outer join t1 on t2.Email = t1.email
left outer join (
    select t1.domain, count(1) cntDomain
    from t1
    left outer join t2 on t1.email = t2.email
    group by t1.domain
    having count(1) > 100
) as t3 on t1.domain = t3.domain

干杯

我认为下面的查询应该可以解决您的问题

       SELECT t2.opened,
       t2.Email,
       CASE WHEN tempt1.email is NULL THEN 'Other' ELSE tempt1.domain END as domain
       FROM t2 LEFT JOIN (SELECT email,domain
       FROM t1
       group by domain HAVING  count(domain)>100) tempt1 on t2.Email=tempt1.email

您需要一个计数*>100的值。您可能希望使第二个查询的第一个联接条件t2.email类似“@”+t1.domain以保持子域之间的分隔。@Allan。这很有道理。非常感谢。它也会影响像gmail.com和mail.com这样的东西。