Arrays 在蜂箱中收集带有箱子的集合_Arrays_Hive_Hiveql

Arrays 在蜂箱中收集带有箱子的集合

arrays hive

Arrays 在蜂箱中收集带有箱子的集合,arrays,hive,hiveql,Arrays,Hive,Hiveql,有没有办法重写下面的case语句，这样我就不用编写Collect_集[0]4次，而可以使用单个Collect_集得到相同的结果 select id,collect_set(name)[0] as name,sum(salary), cASE WHEN month(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))) IN (01,02,03) THEN CONC

有没有办法重写下面的case语句，这样我就不用编写Collect_集[0]4次，而可以使用单个Collect_集得到相同的结果

    select id,collect_set(name)[0] as name,sum(salary),
    cASE WHEN month(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))) 
    IN (01,02,03) THEN 
    CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))))-1,'-'),
    substr(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))),3,4))
     ELSE CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy')))),'-'),
    SUBSTR(year(to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))))+1,3,4)) 
     END as fy from testing_1.collect_set_test group by id;

我写在下面询问

select collect_set(CASE WHEN month(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy'))) 
IN (01,02,03) THEN CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy')))-1,'-'),
substr(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy'))),3,4)) 
ELSE
 CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy'))),'-'),
 SUBSTR(year(to_date(from_unixtime(unix_timestamp(date1), 'dd-MM-yyyy')))+1,3,4))) [0]
 END as fy from testing_1.collect_set_test group by id;

但这是错误的

    FAILED: ParseException line 1:446 missing KW_END at ')' near ']' in selection target
    line 1:452 cannot recognize input near 'END' 'as' 'fy' in selection target

是否有人可以指导我如何重写这些内容。

将所有包含组和日期转换的聚合移动到子查询中，在上面的子查询中计算fy：

select id, name, salary,
    cASE WHEN month(date1) 
               IN (01,02,03) THEN CONCAT(CONCAT(year(date1))-1,'-'),
                              substr(year(date1),3,4))
         ELSE CONCAT(CONCAT(year(date1),'-'),
              SUBSTR(year(date1)+1,3,4)) 
     END as fy 
     from 
          (select to_date(from_unixtime(unix_timestamp(collect_set(date1)[0], 'dd-MM-yyyy'))) as date1, 
                  collect_set(name)[0]  as name, 
                  sum(salary) as salary, 
                  id 
            from testing_1.collect_set_test group by id) s
 ;

Hi@leftjoin是的，通过子查询我已经尝试过了，但是它需要更多的时间来处理我的完整数据，所以我尝试不使用子查询，也不使用多个Collect_set@Varun你为什么要使用collect_set？在您的查询中，我看不到collect_set之前的任何排序，似乎collect_set（date1）[0]可以从groupYes中选择任何日期…它可以选择任何日期…我只想忽略GroupBy中的date1列，因此必须将其保留在collect中_set@Varun然后在同一个gropby中使用max（date）或min（）以及sum（）聚合，优化器将执行max（）只有一次。它的工作速度将快于在（01,02,03）中选择id、收集集合（名称）[0]作为名称、最大值（日期1）、总和（工资）、月份（到日期（从unix时间戳（日期1），'dd-MM-yyyy'））和年份（从unix时间戳（日期1），'dd-MM-yyyyy'）、年份（从unix时间戳（日期1）到日期1）的情况，dd-MM-yyyy'）），3,4）其他CONCAT（CONCAT（年份（至日期）（自unix时间戳（日期1），'dd-MM-yyyy'）），“-”），SUBSTR（年份（至日期）（自unix时间戳（日期1）[0]，'dd-MM-yyyy'））+1,3,4）从测试开始以fy结束。按id收集测试组；