Regex 用于hive的twitter数据的正则表达式
我有以下推特数据 数据分为两部分:Regex 用于hive的twitter数据的正则表达式,regex,hive,hiveql,hadoop2,regex-group,Regex,Hive,Hiveql,Hadoop2,Regex Group,我有以下推特数据 数据分为两部分: @Username 和推特或文本: RT @username: Stay behind, or take the jump (anything in text or tags and emoji)#@name @名字 JJJJJJJJJJ Dhdkeueh Sjdyeh @克杜迪维 Hshedhdkdjfnfjfkfmfmhdkalshsh+)#和#(#)(63+kdjdj 结果: OK username tweet @username:
@Username
和推特或文本:
RT @username: Stay behind, or take the jump (anything in text or tags and emoji)#@name
@名字
JJJJJJJJJJ
Dhdkeueh
Sjdyeh
@克杜迪维
Hshedhdkdjfnfjfkfmfmhdkalshsh+)#和#(#)(63+kdjdj
结果:
OK
username tweet
@username: Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)
OK
username tweet
@username Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 28.587 seconds, Fetched: 1 row(s)
如果不想在用户名中使用“:”,请使用“”^RT\\s(\\s*):\\s(.*)$”
或“^RT\\s(\\s*):?\\s(.*)$”
如果:
是可选的:
with your_data as (
select 'RT @username Stay behind, or take the jump (anything in text or tags and emoji)' as str
)
select regexp_extract(str,'^RT\\s(\\S*):?\\s(.*)$',1) as username,
regexp_extract(str,'^RT\\s(\\S*):?\\s(.*)$',2) as tweet
from your_data;
结果:
OK
username tweet
@username: Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)
OK
username tweet
@username Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 28.587 seconds, Fetched: 1 row(s)
我在这个文件上有一个数据文件,有很多行,我写的模式,数据在tweet部分,有很多行,很多空格,用不同的语言also@Ajazsheikh目前还不清楚问题是什么。请您提供清楚的示例和问题陈述rt@username:hi wassup https//wusksksihsii#wensdayhajsh