Apache pig 正则表达式提取Apache Pig中字符串的第一部分

Apache pig 正则表达式提取Apache Pig中字符串的第一部分,apache-pig,Apache Pig,我需要从下面的输入数据中提取邮政编码地区 AB55 4 DD7 6LL DD5 2HI 我的代码 A = load 'data' as postcode:chararray; B = foreach A { code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1); generate code_district; }; dump B; 提取字符串第一部分的正则表达式应该是什么?可以尝试下面的正则表达式吗 选项1: A = LOA

我需要从下面的输入数据中提取邮政编码地区

AB55 4
DD7 6LL
DD5 2HI
我的代码

A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;

提取字符串第一部分的正则表达式应该是什么?

可以尝试下面的正则表达式吗

选项1:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
(AB55)
(DD7)
(DD5)
选项2:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
(AB55)
(DD7)
(DD5)
输出:

A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
(AB55)
(DD7)
(DD5)

这不适用于非ASCII字符。ISO-8859-9