Hive 对结构数据类型使用like运算符

Hive 对结构数据类型使用like运算符,hive,Hive,我有一个包含结构数组的表。是否有一种方法可以使用like运算符筛选此列中的记录 hive> desc location; location_list array<struct<city:string,state:string>> hive> select * from location; row1 : [{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"ci

我有一个包含结构数组的表。是否有一种方法可以使用like运算符筛选此列中的记录

hive> desc location;
location_list           array<struct<city:string,state:string>>

hive> select * from location;
row1 : [{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
row2 : [{"city":"San Jose","state":"CA"},{"city":"San Diego","state":"CA"}]
hive>desc位置;
位置列表数组
蜂巢>从位置选择*;
第1行:[{“城市”:“哈德逊”,“州”:“纽约”},{“城市”:“圣何塞”,“州”:“加利福尼亚”},{“城市”:“奥尔巴尼”,“州”:“纽约”}]
第2行:[{“城市”:“圣何塞”,“州”:“加州”},{“城市”:“圣地亚哥”,“州”:“加州”}]
我试图运行类似这样的查询,只过滤那些状态为“NY”的记录

hive>select*from location\u列表中的位置,如“%”NY“%”;
失败:SemanticException[错误10014]:第1行:29个错误参数“%”“NY”%”:类org.apache.hadoop.hive.ql.udf.UDFLike没有与(数组,字符串)匹配的方法。可能的选项:_FUNC_(字符串,字符串)

注意:我可以通过对这个结构列进行lateralview&explode来实现这一点。但是尽量避免这样做,因为我需要将此表与另一个不接受横向视图的表连接起来。

好问题,您可以用以下高效(且漂亮)的方法来完成


在这种情况下,
location\u list.state
将创建一个字符串数组(您的案例中的状态),因此您可以使用UDF
array\u contains
进行值检查。这将查找精确的值,您将无法执行类似于
like
操作符的匹配,但您应该能够实现所需的
数组演示

select my_array  
from
( --emulation of your dataset. Just replace this subquery with your table
 select array(named_struct("city","Hudson","state","NY"),named_struct("city","San Jose","state","CA"),named_struct("city","Albany","state","NY")) as my_array
 union all
 select array(named_struct("city","San Jose","state","CA"),named_struct("city","San Diego","state","CA")) as my_array
)s
where array_contains(my_array.state,'NY') 
;
结果:

OK
[{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
Time taken: 34.055 seconds, Fetched: 1 row(s)
select my_array  
from
( --emulation of your dataset. Just replace this subquery with your table
 select array(named_struct("city","Hudson","state","NY"),named_struct("city","San Jose","state","CA"),named_struct("city","Albany","state","NY")) as my_array
 union all
 select array(named_struct("city","San Jose","state","CA"),named_struct("city","San Diego","state","CA")) as my_array
)s
where array_contains(my_array.state,'NY') 
;
OK
[{"city":"Hudson","state":"NY"},{"city":"San Jose","state":"CA"},{"city":"Albany","state":"NY"}]
Time taken: 34.055 seconds, Fetched: 1 row(s)