Hive 配置单元字符串搜索_Hive_Where Clause

Hive 配置单元字符串搜索

hive

Hive 配置单元字符串搜索,hive,where-clause,Hive,Where Clause,我试图在配置单元中查询数据（将其视为表中带有长字符串的列）。具体要求是过滤具有模式的字符串 example string1: "Some content. AB: xyz-p1 CD: rst-p1" example string2: "Some content. AB: xyz-p2 CD: rst-p2" example string3: "Some content. AB: xyz-p1 CD: xyz-p1" example string4: "Some content. AB:

我试图在配置单元中查询数据（将其视为表中带有长字符串的列）。具体要求是过滤具有模式的字符串

example string1: "Some content. AB: xyz-p1 CD: rst-p1"

example string2: "Some content. AB: xyz-p2 CD: rst-p2"

example string3: "Some content. AB: xyz-p1 CD: xyz-p1"

example string4: "Some content. AB: xyz-p2 CD: xyz-p2"

（p1和p2是模式，可以将它们视为字符串。AB:和CD:是固定（常量）字符串。xyz和rst也是字符串，不是常量）

我要求string1和string2应该是配置单元查询结果的一部分，而不是string3和string4。更正式地说，字符串AB和CD后面不能有相同的模式（xyz或rst）

我最初的尝试是：-

select * from tableName where (col1 like '%AB: %-p1%' or col1 like '%CD: %-p2%') 
and (col1 not like '%AB: %-p1%') and (col2 not like '%CD: %-p2%')

但是，这似乎没有给出预期的结果。

使用正则表达式提取这两个字段，然后您可以简单地比较它们。作为使用正则表达式提取字段的示例，我已经测试了这一点：

Select 
 regexp_extract('Some content. AB: xyz-p1 CD: rst-p1','^.*AB:\\s(.*)\\sCD:.*',1) 
   AS pattern1,
 regexp_extract('Some content. AB: xyz-p1 CD: rst-p1','^.*CD:\\s(.*)',1) 
   AS pattern2;

您的完整查询应该类似（未测试）：

从tablename中选择*
其中regexp_extract（col1，，^.*AB:\\s（.*）\\sCD:.*，1）regexp_extract（col1，，^.*CD:\\s（.*），1）；

有关regexp\u extract函数的更多信息，请参见此处

Select * FROM tablename
WHERE regexp_extract(col1,'^.*AB:\\s(.*)\\sCD:.*',1)  <> regexp_extract(col1,'^.*CD:\\s(.*)',1) ;