Hadoop 配置单元表上具有count的多个联接

Hadoop 配置单元表上具有count的多个联接,hadoop,hive,hue,Hadoop,Hive,Hue,所以我有4个不同的表,我想把它们放在一个表中,表中的一列和某个特定值出现在该列中的次数。所有列都是字符串 例如: table A col1 20190204 20190204 20190204 20190205 20190205 20190205 Table B col1 20200204 20200204 20200204 20200204 20200205 20200205 20200205 TableC col1 20210204 20210204 20210204 20210204

所以我有4个不同的表,我想把它们放在一个表中,表中的一列和某个特定值出现在该列中的次数。所有列都是字符串

例如:

table A
col1
20190204
20190204
20190204
20190205
20190205
20190205

Table B
col1
20200204
20200204
20200204
20200204
20200205
20200205
20200205

TableC
col1
20210204
20210204
20210204
20210204
20210205
20210205
20210205

TableD
col1
20220204
20220204
20220204
20220204
20220205
20220205
20220205

TableE -- All the 4 tables will go into here
TableE is empty and needs to be populated with the dates from the other tables and the number of times they occur in those tables. For example:
col1(tablea)    col2           col3(tbaleb)  col4     col5(tablec)    col6
20190204         4             20200204       4       20210204         4     
20190205         3             20200205       3       20210205         3    

col7(tabled)  col8
20220205       3
20220205       4 
    etc...
我不熟悉色调,所以我尝试了以下方法:

insert overwrite into tablee (
tablee.tablea.date, tablee.tablea.datecount,
tablee.tablebdate, tablee.tableb.datecount,
tablee.tablecdate, tablee.tablec.datecount,
tablee.tableddate, tablee.tablea.datedcount,
select tablea.date, count(tablea.date),  
tableb.date, count(tableb.date),
tablec.date, count(tablec.date),
tabled.date, count(tabled.date)
)
from tablea, tableb, tablec, tabled
left join tablee on (tablea.date=tablee.date)
left join tablee on (tableb.date=tablee.date)
left join tablee on (tablec.date=tablee.date)
left join tablee on (tabled.date=tablee.date);

但我无法让它正常工作。有人有什么建议吗?

请检查下面的查询是否给出了您想要的结果集

select * from (select col1,count(*) from tablea group by 1)a
full outer join
(select col1,count(*) from tableb group by 1)b on a.col1=b.col1
full outer join
(select col1,count(*) from tablec group by 1)c on b.col1=c.col1
full outer join
(select col1,count(*) from tabled group by 1)d on c.col1=d.col1;
首先计算每个表中的所有分组数据,然后进行完全外部联接以包含每个表中col1的所有值,从而得到结果集。最后,如果结果集是所需的,我们可以将select语句转换为insert into/OVERRIDE语句