Replace 清管器:更换问题

Replace 清管器:更换问题,replace,foreach,apache-pig,Replace,Foreach,Apache Pig,以下是我的数据的外观: 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245 unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985 199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GE

以下是我的数据的外观:

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985

199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085

burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0
(199.72.81.55,[01/Jul/1995:00:00:01,0400,GET /history/apollo/ HTTP/1.0,200,6245)
(unicomp6.unicomp.net,[01/Jul/1995:00:00:06,0400,GET /shuttle/countdown/ HTTP/1.0,200,3985)
(199.120.110.21,[01/Jul/1995:00:00:09,0400,GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0,200,4085)
(burger.letters.com,[01/Jul/1995:00:00:11,0400,GET /shuttle/countdown/liftoff.html HTTP/1.0,304,0)
(199.120.110.21,[01/Jul/1995:00:00:11,0400,GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0,200,4179)
(burger.letters.com,[01/Jul/1995:00:00:12,0400,GET /images/NASA-logosmall.gif HTTP/1.0,304,0)
以下是清管器代码:

loadFulldata = LOAD '/root/Kennadi-Project/Kennadi-data.txt' USING PigStorage(',') AS (fullline:chararray);

extractData = FOREACH loadFulldata GENERATE FLATTEN (REGEX_EXTRACT_ALL(fullline,'(.*) - - (.*) -(.*)] "(.*)" (.*) (.*)'));

rowdata = FOREACH extractData GENERATE $0 as host,$1 as datetime,$2 as timezone,$3 as responseurl,$4 as responsecode,$5 as response data;
我的数据如下所示:

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985

199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085

burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0
(199.72.81.55,[01/Jul/1995:00:00:01,0400,GET /history/apollo/ HTTP/1.0,200,6245)
(unicomp6.unicomp.net,[01/Jul/1995:00:00:06,0400,GET /shuttle/countdown/ HTTP/1.0,200,3985)
(199.120.110.21,[01/Jul/1995:00:00:09,0400,GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0,200,4085)
(burger.letters.com,[01/Jul/1995:00:00:11,0400,GET /shuttle/countdown/liftoff.html HTTP/1.0,304,0)
(199.120.110.21,[01/Jul/1995:00:00:11,0400,GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0,200,4179)
(burger.letters.com,[01/Jul/1995:00:00:12,0400,GET /images/NASA-logosmall.gif HTTP/1.0,304,0)
当我使用
REGEX\u EXTRACT\u ALL
时,我无法从数据中删除“[”,我如何才能做到这一点

此外,我尝试使用
REPLACE
函数删除“[”,如下所示:

rowdata = FOREACH extractData GENERATE $0 as host,$1 as datadatetime,$2 as timezone,$3 as responseurl,$4 as responsecode,$5 as response data;

newdata = FOREACH rowdata GENERATE REPLACE(datadatetime,'[','');
但我得到以下警告:

2016-01-05 05:10:13,758 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning USING_OVERLOADED_FUNCTION 1 time(s).
2016-01-05 05:10:13,758 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

我想这是因为我没有为
datadatetime
定义任何数据类型,我如何在foreach中定义数据类型?

您遇到了一个问题。您尝试使用正则表达式来解决它。现在有两个问题

说真的,在尝试之后,这似乎只是正则表达式的一个问题

REGEX_EXTRACT_ALL(fullline,'(.*) - - \\[(.*) -(.*)\\] "(.*)" (.*) (.*)')
他为我做了这件事

结果:

(199.72.81.55,01/Jul/1995:00:00:01,0400,GET /history/apollo/ HTTP/1.0,200,6245)
(unicomp6.unicomp.net,01/Jul/1995:00:00:06,0400,GET /shuttle/countdown/ HTTP/1.0,200,3985)
(199.120.110.21,01/Jul/1995:00:00:09,0400,GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0,200,4085)
(burger.letters.com,01/Jul/1995:00:00:11,0400,GET /shuttle/countdown/liftoff.html HTTP/1.0,304,0)

您尝试过这个吗?rowdata=FOREACH extractData生成$0作为主机:chararray,$1作为日期时间:chararray。。。。