Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/svn/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache pig ApachePig:来自web日志的额外查询参数_Apache Pig_Amazon Cloudfront - Fatal编程技术网

Apache pig ApachePig:来自web日志的额外查询参数

Apache pig ApachePig:来自web日志的额外查询参数,apache-pig,amazon-cloudfront,Apache Pig,Amazon Cloudfront,我正在分析AWS CloudFront访问日志 我有加载文件行的代码 raw_logs2 =LOAD 'file:///home/ec2-user/ENWRZAC68E00M.2011-02-28-18.72jA8eGh' USING PigStorage('\t') AS ( date: chararray, time: chararray, x_edge_location: chararray, sc_bytes: int, c_ip: chararray,

我正在分析AWS CloudFront访问日志

我有加载文件行的代码

    raw_logs2 =LOAD 'file:///home/ec2-user/ENWRZAC68E00M.2011-02-28-18.72jA8eGh'
  USING PigStorage('\t')
  AS (
    date: chararray, time: chararray, x_edge_location: chararray, sc_bytes: int,
    c_ip: chararray, cs_method: chararray, cs_host: chararray, cs_uri_stem: chararray,
    sc_status: chararray, cs_referer: chararray, cs_user_agent:chararray, cs_uri_query: chararray
  );
现在,我尝试解析查询字符串参数(名称/值对):


如何为查询字符串中的p、s和gci值向原始日志2表中添加其他列?

一种快速方法是使用:


谢谢罗曼的回复。我对示例进行了扩展,得到了以下结果:
code
raw_logs=GENERATE*,flatte(REGEX_EXTRACT_ALL(cs_uri_query,'p=(.+?)&s=(.+?)&w=(.+?)&h=(.+?)&ad=(.+?)&gad=(.+)&gci=(.+?)&gst=(.+)&gzi=(.+)&kw=(.+)&kw=((p:CHARARRAY,s:CHARARRAY,w:CHARARRAY,h:CHARARRAY,ad:CHARARRAY,gad:CHARARRAY,gci:CHARARRAY,gst:CHARARRAY,gzi:CHARARRAY,kw:CHARARRAY)
/code
我正在使用pig 0.6.0,得到一个错误,它生成了一个错误,并且它期望其他东西。当我从日志文件加载原始日志2时,调用如何知道cs_uri_查询来自于此。我更新到了0.8.0,并在它之前添加了一个FOREACH,效果很好。谢谢。
p=searchresults&s=homesforsale&gad=&gci=FOUNTAIN%2520VALLEY&gst=CA&gzi=&k=fountainvalleyca&ts=1298918206&
raw_logs = 
  GENERATE
    *, 
    FLATTEN(REGEX_EXTRACT_ALL(cs_uri_query, 'p=(.+?)&s=(.+?)&.+?gci=(.+?)&.+?')) 
      AS (p:CHARARRAY, s:CHARARRAY, gci:CHARARRAY);`