Hadoop 配置单元：分组依据上的子查询_Hadoop_Hive_Hiveql

Hadoop 配置单元：分组依据上的子查询

hadoop hive

Hadoop 配置单元：分组依据上的子查询,hadoop,hive,hiveql,Hadoop,Hive,Hiveql,需要配置单元查询的帮助我编写了一个配置单元查询： select to_date(from_unixtime(epoch)) as date, count1 , count2, count3 from table1 where count3=168 结果如下： date count1 count2 count3 7-15-2015 168 3 7 7-15-2015 168 1 5 7-15-2

需要配置单元查询的帮助

我编写了一个配置单元查询：

select to_date(from_unixtime(epoch)) as date, count1 , count2, count3 from table1 where count3=168

结果如下：

date       count1     count2     count3
7-15-2015  168        3           7
7-15-2015  168        1           5
7-15-2015  168        4           3
and similarly for other dates

最后，我需要编写一个查询，返回每个日期的count2和count3的中值。例如：我需要输出为：

date       count1     count2     count3
7-15-2015  168        3           5
and similarly for other dates

我知道我需要使用GROUPBYDATE，然后在此基础上编写子查询。但我不能写出正确的查询。

有人能帮我吗？中位数是第二个四分位数、第五个十分位数和第五十个百分位数。我们可以使用hive中的百分位函数计算第50个百分位：

select to_date(from_unixtime(epoch)) as date
 , count1 
 , percentile(count2,0.5) as median_ct2
 , percentile(count3,0.5) as median_ct3
from table1 
where count1=168
group by to_date(from_unixtime(epoch)), count1;

中位数是第二个四分位数、第五个十分位数和第50个百分位数。我们可以使用hive中的百分位函数计算第50个百分位：

select to_date(from_unixtime(epoch)) as date
 , count1 
 , percentile(count2,0.5) as median_ct2
 , percentile(count3,0.5) as median_ct3
from table1 
where count1=168
group by to_date(from_unixtime(epoch)), count1;

这个解决方案对你有用吗？如果是的话，你能选择这样的方式让社区受益吗？这个方法对你有效吗？如果是这样，您可以选择这样做，以便社区能够受益。