Sql 在配置单元中,如何分解XML中相同父标记下的子标记?

Sql 在配置单元中,如何分解XML中相同父标记下的子标记?,sql,xml,xpath,hive,hiveql,Sql,Xml,Xpath,Hive,Hiveql,在下面的配置单元查询中,我需要映射XML内容中具有相同值的父标记下的子标记。到目前为止,由于父标记值“ABCD”在此处重复,因此正在发生交叉联接 with your_data as ( select '<ParentArray> <ParentFieldArray> <Name>ABCD</Name> <Value> <string>111</strin

在下面的配置单元查询中,我需要映射XML内容中具有相同值的父标记下的子标记。到目前为止,由于父标记值“ABCD”在此处重复,因此正在发生交叉联接

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
除了名称之外,还可以使用posexplode()而不是explode()来获取位置。然后在第二个XPATH中按位置过滤数组,在这种情况下,可能不需要名称过滤器,在更大的数据集上调试它。我同时使用了名称和索引过滤器,它适用于您的数据示例。XPATH中的位置从1开始,而配置单元posexplode中的位置从0开始,这就是使用pos+1的原因:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, pos+1 as pos, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer posexplode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  pos, Name
       lateral view outer explode(XPATH(xmlinfo, concat('((ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;

感谢以上有关索引过滤器的详细信息。然而,还有另一种情况,上面的方法仍然对一个索引进行交叉连接,并为所有其他索引提供空值@leftjoin您能看看下面的场景吗@RKR现在不能看它。以后再做,或者其他人会做help@RKR固定的。这是XPATH中的一个bug。使用括号()非常感谢您的帮助@leftjoin遇到了另一个场景。请你也调查一下这件事好吗?
with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, pos+1 as pos, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer posexplode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  pos, Name
       lateral view outer explode(XPATH(xmlinfo, concat('((ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
name    pos value
ABCD    1   111
ABCD    1   
ABCD    2   
ABCD    2   444
ABCD    2   555