Regex 用于hive的twitter数据的正则表达式

Regex 用于hive的twitter数据的正则表达式,regex,hive,hiveql,hadoop2,regex-group,Regex,Hive,Hiveql,Hadoop2,Regex Group,我有以下推特数据 数据分为两部分: @Username 和推特或文本: RT @username: Stay behind, or take the jump (anything in text or tags and emoji)#@name @名字 JJJJJJJJJJ Dhdkeueh Sjdyeh @克杜迪维 Hshedhdkdjfnfjfkfmfmhdkalshsh+)#和#(#)(63+kdjdj 结果: OK username tweet @username:

我有以下推特数据

数据分为两部分:

@Username 
和推特或文本:

RT @username: Stay behind, or take the jump (anything in text or tags and emoji)#@name
@名字 JJJJJJJJJJ Dhdkeueh Sjdyeh @克杜迪维

Hshedhdkdjfnfjfkfmfmhdkalshsh+)#和#(#)(63+kdjdj 结果:

OK
username        tweet
@username:      Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)
OK
username        tweet
@username       Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 28.587 seconds, Fetched: 1 row(s)
如果不想在用户名中使用“:”,请使用“
”^RT\\s(\\s*):\\s(.*)$”

“^RT\\s(\\s*):?\\s(.*)$”
如果
是可选的:

with your_data as (
 select 'RT @username Stay behind, or take the jump (anything in text or tags and emoji)' as str
 )

 select regexp_extract(str,'^RT\\s(\\S*):?\\s(.*)$',1) as username, 
        regexp_extract(str,'^RT\\s(\\S*):?\\s(.*)$',2) as tweet
    from your_data;
结果:

OK
username        tweet
@username:      Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)
OK
username        tweet
@username       Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 28.587 seconds, Fetched: 1 row(s)

我在这个文件上有一个数据文件,有很多行,我写的模式,数据在tweet部分,有很多行,很多空格,用不同的语言also@Ajazsheikh目前还不清楚问题是什么。请您提供清楚的示例和问题陈述rt@username:hi wassup https//wusksksihsii#wensdayhajsh