Sql 统计多个表中的普通用户数_Sql_Hive_Hiveql

Sql 统计多个表中的普通用户数

sql hive

Sql 统计多个表中的普通用户数,sql,hive,hiveql,Sql,Hive,Hiveql,我有4个表，如下所示我主要想知道表1中有多少用户在表2、3和4中。同样，对于表2，我想得到表1、3和4中有多少用户。表3和表4也是如此基本上所有可能的组合。我想要的最终结果如下我试图解决的一种方法是，对table1与其他表进行left join，然后是count，以获得我输出的第一行。但对所有可能的组合都这样做并不是最优化的。我在寻找任何其他可能的替代方案我的代码是相同的 SELECT COUNT(DISTINCT A.id) table1, COUNT(DISTINCT B.id

我有4个表，如下所示

我主要想知道表1中有多少用户在表2、3和4中。同样，对于表2，我想得到表1、3和4中有多少用户。表3和表4也是如此

基本上所有可能的组合。我想要的最终结果如下

我试图解决的一种方法是，对

table1

与其他表进行

left join

，然后是

count

，以获得我输出的第一行。但对所有可能的组合都这样做并不是最优化的。我在寻找任何其他可能的替代方案

我的代码是相同的

SELECT 
COUNT(DISTINCT A.id) table1,
COUNT(DISTINCT B.id) table2,
COUNT(DISTINCT C.id) table3,
COUNT(DISTINCT D.id) table4
FROM table1 A
LEFT JOIN table2 B
ON A.id = B.id

LEFT JOIN table3 C
ON A.id = C.id

LEFT JOIN table4 D
ON A.id = D.id

（这个提琴是针对mysql的，我正在寻找一种基于SQL的通用方法，而不是任何特定于db的方法）

使用

UNION ALL

select 'table1' as col1,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table1
left join table2 on table1.id=table2.id
left join table3 on table1.id=table3.id
left join table4 on table1.id=table4.id
union all
select 'table2' ,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table2
left join table1 on table2.id=table1.id
left join table3 on table2.id=table3.id
left join table4 on table2.id=table4.id
union all
select 'table3' ,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table3
left join table1 on table3.id=table1.id
left join table2 on table3.id=table2.id
left join table4 on table3.id=table4.id
union all
select 'table4' ,count(table1.id),count(table2.id),count(table3.id),count(table4.id) 
from table4
left join table1 on table4.id=table1.id
left join table2 on table4.id=table2.id
left join table3 on table4.id=table3.id

输出：

col1    tbl1    tbl2    tbl3    tbl4
table1   8      3        2       2
table2   3      6        1       0
table3   2      1        5       0
table4   2      0        0       4

我建议：

with t as (
      select 'table1' as which, id from table1 union all
      select 'table2' as which, id from table2 union all
      select 'table3' as which, id from table3 union all
      select 'table4' as which, id from table4
     )
select ta.which,
       sum(case when tb.which = 'table1' then 1 else 0 end) as cnt_table1,
       sum(case when tb.which = 'table2' then 1 else 0 end) as cnt_table2,
       sum(case when tb.which = 'table3' then 1 else 0 end) as cnt_table3,
       sum(case when tb.which = 'table4' then 1 else 0 end) as cnt_table4
from t ta left join
     t tb
     on ta.id = tb.id
group by ta.which;

注意：这假设

id

在每个表中都是唯一的。考虑到列的名称和样本数据，这是一个合理的假设。但是，如果存在重复项，您可以将CTE中的

union all

更改为

union

这种结构也很容易推广到其他表。

这就是我一直在寻找的答案/代码。这看起来更干净、更智能，并且大大缩短了代码长度。我仍然需要为4个表编写4次代码。我有超过20张这样的桌子。我一直在寻找一种可以缩短代码长度的方法。我对你的努力投了赞成票