Sql 运行查询需要很长时间。有什么办法可以简化吗?
我运行的代码如下。我的跑步时间太长了。有没有办法让它跑得更快Sql 运行查询需要很长时间。有什么办法可以简化吗?,sql,hive,hue,Sql,Hive,Hue,我运行的代码如下。我的跑步时间太长了。有没有办法让它跑得更快 SELECT a.data_date as day , sum(a.column1) + sum(a.column2) as total , sum(a.column1) as part1 , sum(a.column2) as part2 , sum(b.column1) as alien FROM table1 a INNER JOIN table1 b ON a.data_date = b.data_date AND a.
SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
FROM table1 a
INNER JOIN table1 b
ON a.data_date = b.data_date AND a.column3 = b.column3
WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
and b.column4 is NULL
GROUP BY
a.data_date
据我所知,您根本不需要加入
JOIN
。您可以通过单次引用表获得相同的结果。由于这是同一个表,我相信您可以删除您的联接,最好是提供您的示例数据和预期结果,然后我们可以更好地帮助您,Cheel=)
优化技术还取决于表的大小 首先应该使用小表,并尝试将该表放在分布式缓存上 要使其更快,而不是在加入后应用where条件,请尝试在加入前应用它,以便您的加入更快 你可以试试下面的方法
set hive.auto.convert.join.true;
select
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
from table1 b
inner join (select * from table1 WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
)a
on (a.data_date = b.data_date AND a.column3 = b.column3)
where b.column4 is NULL
GROUP BY
a.data_date
毫无疑问,创建一个索引会有所帮助。你试过执行计划吗?在SQL server中按Ctrl+L以查看执行计划。它将告诉您运行查询的大部分资源的位置,以及索引可以改进查询的位置。请记住,它只是告诉您如何改进该查询,而不是整个数据库。它看起来非常干净…只要自连接为每个table1记录生成一行(a.data_date+a.column3是否唯一?),问题是,通过使用a.column4不是NULL,b.column4是NULL,它将为我提供不同的column1数据,这就是我要找的东西,比如从表1 a中选择a.data\u date为day,sum(a.column1)+sum(a.column2)为total,sum(a.column1)为part1,sum(a.column2)为part2,其中a.data\u date='20131001'和a.column3=12345,a.column4由a.data\u date组成的非空组将给出大部分结果,可能会为
添加一个子查询,sum(b.column1)为alien
set hive.auto.convert.join.true;
select
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
from table1 b
inner join (select * from table1 WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
)a
on (a.data_date = b.data_date AND a.column3 = b.column3)
where b.column4 is NULL
GROUP BY
a.data_date