Arrays Pig-当REGEX_EXTRACT_ALL时,Pig会将Chararray数据类型和Int数据类型隐式转换为Chararray

Arrays Pig-当REGEX_EXTRACT_ALL时,Pig会将Chararray数据类型和Int数据类型隐式转换为Chararray,arrays,hadoop,apache-pig,Arrays,Hadoop,Apache Pig,我有一个示例web日志数据,其中包含IP、dd/mmm/yyyy格式的日期、url以及web日志中生成的其他详细信息。我正在尝试将web日志分为多个字段—IP、日期和url。以下是我在PIG中创建的脚本: A = Load 'weblogs_rebuild sample.txt' using TextLoader() as Log:chararray; B = foreach A generate flatten (REGEX_EXTRACT_ALL(Log, '([\\S]+)[\\s+-]+

我有一个示例web日志数据,其中包含IP、dd/mmm/yyyy格式的日期、url以及web日志中生成的其他详细信息。我正在尝试将web日志分为多个字段—IP、日期和url。以下是我在PIG中创建的脚本:

A = Load 'weblogs_rebuild sample.txt' using TextLoader() as Log:chararray;
B = foreach A generate flatten (REGEX_EXTRACT_ALL(Log, '([\\S]+)[\\s+-]+[\\[]+([\\d]+)[/]+([\\w]+)[/]+([\\d]+)(.*)[\\]]+[\\s+]+[\\"]+([\\w\\s+/\\d.]+)[\\"]+[\\s+]+(.*)')) as (field1:chararray,date:int,month:chararray,year:int,timefield:chararray,useraction:chararray,userfiled:chararray);
当我在创建关系B后按enter键时,它会向我发出警告

org.apache.pig.newplan.BaseOperatorPlan-遇到警告 无负载功能,用于按阵列14次浇铸

如果再次按运行关系B脚本并按enter键,则会显示警告消息

org.apache.pig.newplan.BaseOperatorPlan-遇到警告 无负载功能,用于按阵列21次浇铸

当我描述B时,它会显示bytearray中的所有字段

B: {field1: bytearray,date: bytearray,month: bytearray,year: bytearray,timefield: bytearray,useraction: bytearray,userfiled: bytearray}
我不明白为什么数字每次增加7,为什么数据类型chararray,int转换成Bytearray

这就是我在甩B时看到的

(364.635.03.677,26,Oct,2011,:22:39:30 -0500,GET /feeds/press HTTP/1.1,200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)")
web日志示例:

323.81.303.680 - - [25/Oct/2011:01:41:00 -0500] "GET /download/download6.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19"
668.667.44.3 - - [25/Oct/2011:07:38:30 -0500] "GET /download/download3.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070719 CentOS/1.5.0.12-3.el5.centos Firefox/1.5.0.12"
13.386.648.380 - - [25/Oct/2011:17:06:00 -0500] "GET /download/download6.zip HTTP/1.1" 200 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.3; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2)"
06.670.03.40 - - [26/Oct/2011:13:24:00 -0500] "GET /product/demos/product2 HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
18.656.618.46 - - [26/Oct/2011:17:15:30 -0500] "GET /download/download4.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7"
14.688.663.667 - - [26/Oct/2011:21:02:30 -0500] "GET /news HTTP/1.1" 200 0 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"
13.07.338.684 - - [26/Oct/2011:21:02:30 -0500] "GET /download HTTP/1.1" 200 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; OfficeLiveConnector.1.4; OfficeLivePatch.1.3)"
14.688.663.667 - - [26/Oct/2011:21:02:30 -0500] "GET /news HTTP/1.1" 200 0 "/news" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"