Arrays 在蜂箱中收集带有箱子的集合
有没有办法重写下面的case语句,这样我就不用编写Collect_集[0]4次,而可以使用单个Collect_集得到相同的结果Arrays 在蜂箱中收集带有箱子的集合,arrays,hive,hiveql,Arrays,Hive,Hiveql,有没有办法重写下面的case语句,这样我就不用编写Collect_集[0]4次,而可以使用单个Collect_集得到相同的结果 select id,collect_set(name)[0] as name,sum(salary), cASE WHEN month(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))) IN (01,02,03) THEN CONC
select id,collect_set(name)[0] as name,sum(salary),
cASE WHEN month(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))))
IN (01,02,03) THEN
CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))))-1,'-'),
substr(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))),3,4))
ELSE CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))),'-'),
SUBSTR(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))))+1,3,4))
END as fy from testing_1.collect_set_test group by id;
我写在下面询问
select collect_set(CASE WHEN month(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy')))
IN (01,02,03) THEN CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy')))-1,'-'),
substr(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy'))),3,4))
ELSE
CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy'))),'-'),
SUBSTR(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy')))+1,3,4))) [0]
END as fy from testing_1.collect_set_test group by id;
但这是错误的
FAILED: ParseException line 1:446 missing KW_END at ')' near ']' in selection target
line 1:452 cannot recognize input near 'END' 'as' 'fy' in selection target
是否有人可以指导我如何重写这些内容。将所有包含组和日期转换的聚合移动到子查询中,在上面的子查询中计算fy:
select id, name, salary,
cASE WHEN month(date1)
IN (01,02,03) THEN CONCAT(CONCAT(year(date1))-1,'-'),
substr(year(date1),3,4))
ELSE CONCAT(CONCAT(year(date1),'-'),
SUBSTR(year(date1)+1,3,4))
END as fy
from
(select to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))) as date1,
collect_set(name)[0] as name,
sum(salary) as salary,
id
from testing_1.collect_set_test group by id) s
;
Hi@leftjoin是的,通过子查询我已经尝试过了,但是它需要更多的时间来处理我的完整数据,所以我尝试不使用子查询,也不使用多个Collect_set@Varun你为什么要使用collect_set?在您的查询中,我看不到collect_set之前的任何排序,似乎collect_set(date1)[0]可以从groupYes中选择任何日期…它可以选择任何日期…我只想忽略GroupBy中的date1列,因此必须将其保留在collect中_set@Varun然后在同一个gropby中使用max(date)或min()以及sum()聚合,优化器将执行max()只有一次。它的工作速度将快于在(01,02,03)中选择id、收集集合(名称)[0]作为名称、最大值(日期1)、总和(工资)、月份(到日期(从unix时间戳(日期1),'dd-MM-yyyy'))和年份(从unix时间戳(日期1),'dd-MM-yyyyy')、年份(从unix时间戳(日期1)到日期1)的情况,dd-MM-yyyy')),3,4)其他CONCAT(CONCAT(年份(至日期)(自unix时间戳(日期1),'dd-MM-yyyy')),“-”),SUBSTR(年份(至日期)(自unix时间戳(日期1)[0],'dd-MM-yyyy'))+1,3,4)从测试开始以fy结束。按id收集测试组;