Apache pig Pig拉丁语:过滤器编号<;5和>;=5个字符(文本和数字)

Apache pig Pig拉丁语:过滤器编号<;5和>;=5个字符(文本和数字),apache-pig,Apache Pig,我如何筛选或分组5年以下和5年以上的人。我对猪拉丁语很陌生。ID(如BUS2003)应保持原样 输入数据 ID,Experience BUS2003,More than 17 years teaching experience BUS1303,2 years teaching experience BUS4543,13 plus years of teaching experience; 4 plus years of corporate experience BUS2103,4 year +

我如何筛选或分组5年以下和5年以上的人。我对猪拉丁语很陌生。ID(如BUS2003)应保持原样

输入数据

ID,Experience
BUS2003,More than 17 years teaching experience
BUS1303,2 years teaching experience
BUS4543,13 plus years of teaching experience; 4 plus years of corporate experience
BUS2103,4 year + 6 years in business
BUS2913,8 yrs teaching experience
我知道如何将数据加载到PigStorage或CSVloader,但是,由于单词和数字在一起,我很难解决这个问题

预期结果:

**Less than five years**
BUS1303,2 years teaching experience
BUS2103,4 year + 6 years in business

**Equal or greater than five years**
BUS2003,More than 17 years teaching experience
BUS4543,13 plus years of teaching experience; 4 plus years of corporate experience
BUS2913,8 yrs teaching experience

提前谢谢

您必须提取数字,然后进行拆分。这将为您找到所需的内容

A = LOAD 'input.txt' USING PigStorage(',') AS (a1:chararray,a2:chararray);
B = FOREACH A GENERATE a1,a2,REGEX_EXTRACT(a2,'(\\d*)',1) as exp:int;
C = SPLIT B INTO C1 IF B.exp < 5, C2 IF B.exp >= 5;
DUMP C1;
DUMP C2;
A=使用PigStorage(',')加载'input.txt',作为(a1:chararray,a2:chararray);
B=每个A生成a1,a2,正则表达式(a2,“(\\d*)”,1)作为exp:int;
C=如果B.exp<5,则将B拆分为C1;如果B.exp>=5,则将B拆分为C2;
倾倒区C1;
倾倒区C2;