Hadoop 在Hive中使用Rlike查找正则表达式模式

Hadoop 在Hive中使用Rlike查找正则表达式模式,hadoop,hive,Hadoop,Hive,我想筛选一个列来检查诸如head、att、space等词,我正在使用以下查询 select * from tablename where (column_name like '%head%' or column_name like '%att%' or column_name like '%space%') 但这个查询的问题是,它甚至过滤掉了诸如头盔、态度、宇宙飞船之类的词。我只想筛选具有特定单词(如head、att、space)的行。我试着给每个单词留出一个空格 select * from

我想筛选一个列来检查诸如head、att、space等词,我正在使用以下查询

select * from tablename where (column_name like '%head%' or column_name like '%att%' or column_name like '%space%')
但这个查询的问题是,它甚至过滤掉了诸如头盔、态度、宇宙飞船之类的词。我只想筛选具有特定单词(如head、att、space)的行。我试着给每个单词留出一个空格

select * from tablename where (column_name like '%head %' or column_name like '%att %' or column_name like '%space %')
但如果在句末出现head,这不会过滤单词

我想我们可以用类似蜂巢里的rlike来解决这个问题。但我试着做了,但没有多大成功

有谁能帮我用rlike过滤只包含head、att、space等单词的行吗

谢谢

正在添加更新

假设输入如下

Tom's head
my head is big
I am having headache
att is bad
attitude is bad
bad is att
There is more space
spaceship
space is looking cool
输出应该是,

Tom's head
my head is big
att is bad
bad is att
There is more space
space is looking cool
以下几行应该删除,因为我只对head、att和space等词感兴趣,只要它们出现在句子中。我对过滤头痛、态度和宇宙飞船不感兴趣

I am having headache
attitude is bad
spaceship
谢谢

RLIKE使用了我们在大多数编程语言中使用的通用正则表达式语法

^head$表示列应以“^”开头,以$结尾,并带有head

例如,如果要筛选以h开头、以d结尾的单词,可以执行以下操作:^h.*d$。上述问题的解决方案如下:

SELECT * FROM tablename 
WHERE
(
  column_name RLIKE '^head$' OR
  column_name RLIKE '^att$' OR
  column_name RLIKE '^space$'
);

Ref:

word boundary适用于此场景,它在开始、中间和结尾捕获字符串

with aa as
(select 'Toms head' as col1
union all
select 'head as in headache' as col1
union all
select 'headache as in head' as col1
union all
select 'my head is big' as col1
union all
select 'I am having headache' as col1
union all
select 'att is bad' as col1
union all
select 'attitude is bad' as col1
union all
select 'bad is att' as col1
union all
select 'There is more space' as col1
union all
select 'spaceship' as col1
union all
select 'space is looking cool' as col1)
select col1 from aa
where regexp(col1,'\\bhead\\b|\\batt\\b|\\bspace\\b')

谢谢你的回复。Head甚至可以位于列内部,而不是以Head开头。像汤姆的头。这个也可以吗?不,这个不行。我会给你回最新的声明。你能在你的问题中添加所有的测试用例吗?这样我就可以测试我的陈述并给你正确的答案了。谢谢: