Hadoop 如何在蜂巢中找到最近的邻居?有窗口功能吗?

Hadoop 如何在蜂巢中找到最近的邻居?有窗口功能吗?,hadoop,mapreduce,hive,hiveql,Hadoop,Mapreduce,Hive,Hiveql,给一张桌子 预期结果: ID0, ID1 1,2 4,5 6,7 8,7 对于上面带有Flag=0的每个ID,我们希望从Flag=1中找到另一个ID,具有相同的“State”和“City”,以及最接近的价格 我有两个愚蠢的想法: 方法1 Use a left outer join with the table itself on (a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1), where a.Fl

给一张桌子

预期结果:

ID0, ID1
1,2
4,5
6,7
8,7
对于上面带有Flag=0的每个ID,我们希望从Flag=1中找到另一个ID,具有相同的“State”和“City”,以及最接近的价格

我有两个愚蠢的想法:

方法1

Use a left outer join with the table itself on 
    (a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1),
     where a.Flag=0 and b.Flag=1, 

    and then use RANK() over (partitioned by a.State,a.City order by a.Price - b.Price) as rank
    where rank=1
方法2

Use a left outer join with the table itself, 
on 
(a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1),
 where a.Flag=0 and b.Flag=1, 

and then Use Distribute by a.State,a.City Sort by Price_Diff ASC limit 1 
在蜂巢中找到最近邻居的最好方法是什么? 任何有价值的提示将不胜感激

select a.id, b.id , min(abs(b.price-a.price)) as delta 
from data as a 
     inner join data as b 
            on a.country=b.country and 
               a.flag=0 and b.flag=1 and 
               a.city=b.city 
group by a.id, b.id  
order by delta asc;
这是回报

1   2   1  <---
8   7   2  <---
6   7   3  <--- 
4   5   4  <--- 
8   9   10
6   9   15
1   3   100
这会回来的

   id0 id1 prc rank
    1   2   1   1  <---
    1   3   100 2
    4   5   4   1  <---
    8   7   2   1  <--- 
    6   7   3   2
    8   9   10  3
    6   9   15  4
(6,7)、(6,9)、(8,7)、(8,9)的最低价差在(8,7)中。(不明确连接)

我想你会喜欢这个关于这个主题的视频:

select a.id as id0, b.id as id1, abs(b.price-a.price) as delta, 
       rank() over ( partition by a.country, a.city order by abs(b.price-a.price) ) 
from data as a 
      inner join data as b 
            on a.country=b.country and 
            a.flag=0 and b.flag=1 and 
            a.city=b.city;
   id0 id1 prc rank
    1   2   1   1  <---
    1   3   100 2
    4   5   4   1  <---
    8   7   2   1  <--- 
    6   7   3   2
    8   9   10  3
    6   9   15  4
6,NY,C,24,0 
7,NY,C,27,1
8,NY,C,29,0
9,NY,C,39,1