Regex 用于hive的twitter数据的正则表达式_Regex_Hive_Hiveql_Hadoop2_Regex Group

Regex 用于hive的twitter数据的正则表达式

regex hive

Regex 用于hive的twitter数据的正则表达式,regex,hive,hiveql,hadoop2,regex-group,Regex,Hive,Hiveql,Hadoop2,Regex Group,我有以下推特数据数据分为两部分： @Username 和推特或文本： RT @username: Stay behind, or take the jump (anything in text or tags and emoji)#@name @名字 JJJJJJJJJJ Dhdkeueh Sjdyeh @克杜迪维 Hshedhdkdjfnfjfkfmfmhdkalshsh+）#和#（#）（63+kdjdj 结果: OK username tweet @username:

我有以下推特数据

数据分为两部分：

@Username

和推特或文本：

RT @username: Stay behind, or take the jump (anything in text or tags and emoji)#@name

@名字 JJJJJJJJJJ Dhdkeueh Sjdyeh @克杜迪维

Hshedhdkdjfnfjfkfmfmhdkalshsh+）#和#（#）（63+kdjdj 结果:

OK
username        tweet
@username:      Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)

OK
username        tweet
@username       Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 28.587 seconds, Fetched: 1 row(s)

如果不想在用户名中使用“：”，请使用“

”^RT\\s（\\s*）：\\s（.*）$”

或

“^RT\\s（\\s*）：？\\s（.*）$”

如果

：

是可选的：

with your_data as (
 select 'RT @username Stay behind, or take the jump (anything in text or tags and emoji)' as str
 )

 select regexp_extract(str,'^RT\\s(\\S*):?\\s(.*)$',1) as username, 
        regexp_extract(str,'^RT\\s(\\S*):?\\s(.*)$',2) as tweet
    from your_data;

结果:

OK
username        tweet
@username:      Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 1.092 seconds, Fetched: 1 row(s)

OK
username        tweet
@username       Stay behind, or take the jump (anything in text or tags and emoji)
Time taken: 28.587 seconds, Fetched: 1 row(s)

我在这个文件上有一个数据文件，有很多行，我写的模式，数据在tweet部分，有很多行，很多空格，用不同的语言also@Ajazsheikh目前还不清楚问题是什么。请您提供清楚的示例和问题陈述rt@username:hi wassup https//wusksksihsii#wensdayhajsh