Hive 如何使用配置单元将列值分隔为不同的列_Hive_Hiveql

Hive 如何使用配置单元将列值分隔为不同的列

hive

Hive 如何使用配置单元将列值分隔为不同的列,hive,hiveql,Hive,Hiveql,输入： name year run 1. a 2008 4 2. a 2009 3 3. a 2008 4 4. b 2009 8 5. b 2008 5 配置单元中的输出： name 2008 2009 1. a 8 3 2. b 5 8 固定年份： select name, max(case when year=2008 then run end) as year_2008, max(case when yea

输入：

 name year run
 1. a    2008 4
 2. a    2009 3
 3. a    2008 4
 4. b    2009 8
 5. b    2008 5

配置单元中的输出：

 name 2008 2009
 1. a 8 3
 2. b 5 8

固定年份：

select name,
       max(case when year=2008 then run end) as year_2008, 
       max(case when year=2009 then run end) as year_2009, 
       ... and so on
  from my_table
  group by name;

在配置单元中不可能动态生成这样的列，但可以先选择不同的年份，然后使用shell生成此SQL

对于固定年份：

select name,
       max(case when year=2008 then run end) as year_2008, 
       max(case when year=2009 then run end) as year_2009, 
       ... and so on
  from my_table
  group by name;

在配置单元中不可能动态生成这样的列，但可以先选择不同的年份，然后使用shell生成此SQL

根据我的理解，您需要将每年的一些运行数据透视到年列中

你需要的是求和函数，而不是max

select
sum(case when year=2008 then run else 0 end) 2008_run,
sum(case when year=2009 then run else 0 end) 2009_run,
from table t1
group by name;

找出每年排名前五的跑步得分手

with table1 as
(
select name, sum(runs) as RunsPerYear, year from myTable group by name, year
)
table2 as
(
select name, year, RunsPerYear, dense_rank() over (partition by name, year order by RunsPerYear) as rnk from table2
)
select name, year, RunsPerYear from table2 where rnk<=5;

根据我的理解，您需要将每年的一些运行数据透视到年列中

你需要的是求和函数，而不是max

select
sum(case when year=2008 then run else 0 end) 2008_run,
sum(case when year=2009 then run else 0 end) 2009_run,
from table t1
group by name;

找出每年排名前五的跑步得分手

with table1 as
(
select name, sum(runs) as RunsPerYear, year from myTable group by name, year
)
table2 as
(
select name, year, RunsPerYear, dense_rank() over (partition by name, year order by RunsPerYear) as rnk from table2
)
select name, year, RunsPerYear from table2 where rnk<=5;

如何使用这些语法查找前5名击球手的全年跑数？按名称顺序计算分区上的密集排名，按run DESC作为子查询中的排名，而不是从my_表中计算排名，并按rankHow的位置筛选以查找前5名击球手的全年跑数，使用这些语法？按名称顺序计算分区上的稠密_排名，方法是在子查询中以排名的形式运行DESC，而不是从my_表中运行，并按排名的位置进行过滤