Apache pig 正则表达式提取Apache Pig中字符串的第一部分
我需要从下面的输入数据中提取邮政编码地区Apache pig 正则表达式提取Apache Pig中字符串的第一部分,apache-pig,Apache Pig,我需要从下面的输入数据中提取邮政编码地区 AB55 4 DD7 6LL DD5 2HI 我的代码 A = load 'data' as postcode:chararray; B = foreach A { code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1); generate code_district; }; dump B; 提取字符串第一部分的正则表达式应该是什么?可以尝试下面的正则表达式吗 选项1: A = LOA
AB55 4
DD7 6LL
DD5 2HI
我的代码
A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;
提取字符串第一部分的正则表达式应该是什么?可以尝试下面的正则表达式吗 选项1:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
(AB55)
(DD7)
(DD5)
选项2:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
(AB55)
(DD7)
(DD5)
输出:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
(AB55)
(DD7)
(DD5)
这不适用于非ASCII字符。ISO-8859-9