Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/database/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 如何使用hive简化计算效率?_Sql_Database_Hive_Pyspark - Fatal编程技术网

Sql 如何使用hive简化计算效率?

Sql 如何使用hive简化计算效率?,sql,database,hive,pyspark,Sql,Database,Hive,Pyspark,代码正在配置单元上运行: select day,count(mdn)*5 as number from (select distinct a.mdn,a.day from flow a left outer join flow b on a.day=date_add(b.day,-1) and a.mdn=b.mdn left outer join flow c on a.day=date_add(c.day,-2) and a.mdn=c.mdn left outer join flow d

代码正在配置单元上运行:

select day,count(mdn)*5 as number from
(select distinct a.mdn,a.day from 
flow a
left outer join
flow b
on a.day=date_add(b.day,-1) and a.mdn=b.mdn
left outer join
flow c
on a.day=date_add(c.day,-2) and a.mdn=c.mdn
left outer join
flow d
on a.day=date_add(d.day,-3) and a.mdn=d.mdn
where b.mdn is null  and c.mdn is null  and d.mdn is null)t 
group by day

代码的逻辑是选择一个今天没有出现在未来三天中的mdn,并计算mdn的数量,但是由于使用相同的大表流进行三次连接,该代码的效率很低。如何高效地简化它?

好吧,您可以使用
lead()
查看第二天并比较日期时间:

select f.*
from (select f.*,
             lead(f.day) over (partition by f.mdn order by f.day) as next_day
      from flow f
     ) f
where next_day > date_add(day, 3) or next_date is null;

这里的下一天应该是下一天,lead(f.day)应该改为lead(f.day,3),以包括未来三天?@pring。是和否。此代码只需期待第二天,并将其与当前日期进行比较。谢谢!我明白了!