Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 配置单元/Impala-在层次结构表中查找结束子节点_Hadoop_Hive_Hiveql_Impala - Fatal编程技术网

Hadoop 配置单元/Impala-在层次结构表中查找结束子节点

Hadoop 配置单元/Impala-在层次结构表中查找结束子节点,hadoop,hive,hiveql,impala,Hadoop,Hive,Hiveql,Impala,我有一个场景,从具有父节点id和子节点id的层次表中查找最低级别的子节点,如下所示。 源表位于Hive和Impala数据库中。 请建议使用hive/impala查询来查找源表中每个父节点的最低级别的子节点 我曾尝试在Impala中使用CTE递归查询,但我猜它不受支持 提前谢谢你 源表: +-------------+--------------+ |child_node_id|parent_node_id| +-------------+--------------+ | C1

我有一个场景,从具有父节点id和子节点id的层次表中查找最低级别的子节点,如下所示。 源表位于Hive和Impala数据库中。 请建议使用hive/impala查询来查找源表中每个父节点的最低级别的子节点

我曾尝试在Impala中使用CTE递归查询,但我猜它不受支持

提前谢谢你

源表:

+-------------+--------------+
|child_node_id|parent_node_id|
+-------------+--------------+
|     C1      |      P1      |
+-------------+--------------+
|     C2      |      P2      |   
+-------------+--------------+
|     C11     |      C1      |
+-------------+--------------+
|     C12     |      C11     |
+-------------+--------------+
|     123     |      C12     |
+-------------+--------------+
预期产出:

+-------------+--------------+
|parent_node  |lowest_l_child|
+-------------+--------------+
|     P1      |      123     | 
+-------------+--------------+
|      P2     |       C2     |
+-------------+--------------+
|     C1      |      123     |
+-------------+--------------+
|     C11     |      123     |
+-------------+--------------+
|     C12     |      123     |
+-------------+--------------+

因为配置单元不支持递归CTE查询

请参阅[以了解一个选项

另一种选择是使用shell脚本循环和查询,以查找所有父级的最低子级

台阶- 1> 初始化(一次运行)

2) 查询以查找最低级别的子级

insert into table res 
select
H1.parent, H1.Child
from hier_temp H1 left outer join hier_temp H2
on H1.Child=H2.Parent where H2.Child is null;
3) 用下一级子级覆盖临时表

insert overwrite table hier_temp 
select
H1.Parent Parent, coalesce(H3.child, H2.child) as child
from hier_temp H1 left outer join hier_temp H2 on H1.Child=H2.Parent
left outer join res H3 on H2.child=H3.parent
 where H2.Child is not null;
创建一个shell脚本,该脚本将在循环中按顺序执行步骤2和3(带break和continue的条件语句将执行该任务),直到hier_temp表中没有任何数据为止

以下是给定测试数据的res和hier_temp表的结果

hive> select * from res;
OK
Time taken: 0.131 seconds
hive> select * from hier_temp;
OK
C1      C11
C11     C12
C12     123
P1      C1
P2      C2
Time taken: 0.108 seconds, Fetched: 5 row(s)
步骤2和步骤3中提到的查询的loop1之后的结果

hive> select * from res;
OK
C12     123
P2      C2
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> select * from hier_temp;
OK
P1      C11
C1      123
C11     123
Time taken: 0.155 seconds, Fetched: 3 row(s)
hive> select * from res;
OK
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.11 seconds, Fetched: 4 row(s)
hive> select * from hier_temp;
OK
P1      123
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> select * from res;
OK
P1      123
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.115 seconds, Fetched: 5 row(s)
hive> select * from hier_temp;
OK
Time taken: 0.16 seconds
步骤2和步骤3中提到的查询的loop2之后的结果

hive> select * from res;
OK
C12     123
P2      C2
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> select * from hier_temp;
OK
P1      C11
C1      123
C11     123
Time taken: 0.155 seconds, Fetched: 3 row(s)
hive> select * from res;
OK
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.11 seconds, Fetched: 4 row(s)
hive> select * from hier_temp;
OK
P1      123
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> select * from res;
OK
P1      123
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.115 seconds, Fetched: 5 row(s)
hive> select * from hier_temp;
OK
Time taken: 0.16 seconds
步骤2和步骤3中提到的查询的结果3

hive> select * from res;
OK
C12     123
P2      C2
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> select * from hier_temp;
OK
P1      C11
C1      123
C11     123
Time taken: 0.155 seconds, Fetched: 3 row(s)
hive> select * from res;
OK
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.11 seconds, Fetched: 4 row(s)
hive> select * from hier_temp;
OK
P1      123
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> select * from res;
OK
P1      123
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.115 seconds, Fetched: 5 row(s)
hive> select * from hier_temp;
OK
Time taken: 0.16 seconds
<>这将获得预期的结果,但您可能需要考虑执行的时间。


希望这有帮助,因为hive不支持递归CTE查询

请参阅[以了解一个选项

另一种选择是使用shell脚本循环和查询,以查找所有父级的最低子级

台阶- 1> 初始化(一次运行)

2) 查询以查找最低级别的子级

insert into table res 
select
H1.parent, H1.Child
from hier_temp H1 left outer join hier_temp H2
on H1.Child=H2.Parent where H2.Child is null;
3) 用下一级子级覆盖临时表

insert overwrite table hier_temp 
select
H1.Parent Parent, coalesce(H3.child, H2.child) as child
from hier_temp H1 left outer join hier_temp H2 on H1.Child=H2.Parent
left outer join res H3 on H2.child=H3.parent
 where H2.Child is not null;
创建一个shell脚本,该脚本将在循环中按顺序执行步骤2和3(带break和continue的条件语句将执行该任务),直到hier_temp表中没有任何数据为止

以下是给定测试数据的res和hier_temp表的结果

hive> select * from res;
OK
Time taken: 0.131 seconds
hive> select * from hier_temp;
OK
C1      C11
C11     C12
C12     123
P1      C1
P2      C2
Time taken: 0.108 seconds, Fetched: 5 row(s)
步骤2和步骤3中提到的查询的loop1之后的结果

hive> select * from res;
OK
C12     123
P2      C2
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> select * from hier_temp;
OK
P1      C11
C1      123
C11     123
Time taken: 0.155 seconds, Fetched: 3 row(s)
hive> select * from res;
OK
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.11 seconds, Fetched: 4 row(s)
hive> select * from hier_temp;
OK
P1      123
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> select * from res;
OK
P1      123
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.115 seconds, Fetched: 5 row(s)
hive> select * from hier_temp;
OK
Time taken: 0.16 seconds
步骤2和步骤3中提到的查询的loop2之后的结果

hive> select * from res;
OK
C12     123
P2      C2
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> select * from hier_temp;
OK
P1      C11
C1      123
C11     123
Time taken: 0.155 seconds, Fetched: 3 row(s)
hive> select * from res;
OK
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.11 seconds, Fetched: 4 row(s)
hive> select * from hier_temp;
OK
P1      123
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> select * from res;
OK
P1      123
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.115 seconds, Fetched: 5 row(s)
hive> select * from hier_temp;
OK
Time taken: 0.16 seconds
步骤2和步骤3中提到的查询的结果3

hive> select * from res;
OK
C12     123
P2      C2
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> select * from hier_temp;
OK
P1      C11
C1      123
C11     123
Time taken: 0.155 seconds, Fetched: 3 row(s)
hive> select * from res;
OK
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.11 seconds, Fetched: 4 row(s)
hive> select * from hier_temp;
OK
P1      123
Time taken: 0.111 seconds, Fetched: 1 row(s)
hive> select * from res;
OK
P1      123
C12     123
P2      C2
C1      123
C11     123
Time taken: 0.115 seconds, Fetched: 5 row(s)
hive> select * from hier_temp;
OK
Time taken: 0.16 seconds
<>这将获得预期的结果,但您可能需要考虑执行的时间。


希望这有帮助

@gobrewers14-请帮助解决此问题陈述。@gobrewers14-请帮助解决此问题陈述。这是您的回答!我将尝试此解决方案。在输出中,我们获得C1-->C12,但C12的子节点是123。我希望输出中包含C1-->123。请帮助更正查询。请帮助我n修改此查询以获得所需的结果。您现在可以检查吗,我又添加了一个联接,最好用更多的示例检查以确保查询没有问题。此方法适用于示例数据。我将尝试在实际数据集上运行此方法。非常感谢!!您的响应!我将尝试此解决方案。在输出w中e正在获取C1-->C12,但C12的子节点是123。我希望输出中包含C1-->123。请帮助更正查询。请帮助修改此查询以获得所需的结果。您现在可以检查吗?我已添加了一个连接,最好使用更多示例进行检查,以确保查询没有问题此操作适用于s充足的数据。我将尝试在实际数据集上运行此方法。非常感谢!!