Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/logging/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Logging 将Apache的URI和refere合并到pig中_Logging_Apache Pig - Fatal编程技术网

Logging 将Apache的URI和refere合并到pig中

Logging 将Apache的URI和refere合并到pig中,logging,apache-pig,Logging,Apache Pig,我将首先声明,我是一个系统管理员的贸易和猪新手,所以请温柔 我正试图使用PIG从我们的CDN解析apacheweb日志。对于一个应用程序,我们有三种不同的调用类型,它们可以从URI和3个不同的应用程序/版本字符串中收集,这些字符串是由应用程序开发中的不一致性引起的。我需要收集它们并生成一份报告,详细说明每个应用程序/版本的每种类型的呼叫数 调用类型将包含以下类型之一:valid、wms、tile userAgent字段中的应用程序名称可以如下所示: 应用程序%20NAME/0.0 CFNetwo

我将首先声明,我是一个系统管理员的贸易和猪新手,所以请温柔

我正试图使用PIG从我们的CDN解析apacheweb日志。对于一个应用程序,我们有三种不同的调用类型,它们可以从URI和3个不同的应用程序/版本字符串中收集,这些字符串是由应用程序开发中的不一致性引起的。我需要收集它们并生成一份报告,详细说明每个应用程序/版本的每种类型的呼叫数

调用类型将包含以下类型之一:valid、wms、tile userAgent字段中的应用程序名称可以如下所示:

应用程序%20NAME/0.0 CFNetwork/609.1.4达尔文/13.0.0

Android应用程序名称0.0.0 SCH-I605-Android 4.1.2,SDK XX

应用程序名称0.0.0 iPhone OS 6.1.3-iPhone,XXX.XX.XXX.XX.XXXX,XXXXXXXX 0.0

这就是我在发现userAgent命名的不一致性之前所做的工作。充其量可能是一次黑客攻击,但它正在生产所需的东西

感谢您的帮助

register file:/home/hadoop/lib/pig/piggybank.jar
DEFINE LogLoader org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader();
DEFINE DayExtractor org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('yyyy-MM-dd');
DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT;
logs = LOAD '$INPUT' USING LogLoader as (remoteAddr, remoteLogname, user, time, method, uri, proto, status, bytes, referer,userAgent);
FILTERED = FILTER logs by userAgent matches '.*MapKit.*' OR userAgent matches '.*Darwin.*' or userAgent matches '.*Android.*';
DARWINONLY = FOREACH FILTERED GENERATE DayExtractor(time) as day, uri, bytes, userAgent;
FILTERVALID = FILTER DARWINONLY BY uri matches '.*valid.*';
FILTERTILE = FILTER DARWINONLY BY uri matches '.*tile.*';
FILTERWMS = FILTER DARWINONLY BY uri matches '.*wms.*';
VALIDAPPTIME = FOREACH FILTERVALID GENERATE day as validframeday, EXTRACT(userAgent, '([^\\s]+)') as validframeapp,bytes as validbytes;
WMSAPPTIME = FOREACH FILTERWMS GENERATE day as wmsday, EXTRACT(userAgent, '([^\\s]+)') as wmsapp,  bytes as wmsbytes;
TILEAPPTIME = FOREACH FILTERTILE GENERATE day as tileday, EXTRACT(userAgent, '([^\\s]+)') as tileapp, bytes as tilebytes;
GROUPWMS = GROUP WMSAPPTIME BY ($0,$1);
GROUPTILE = GROUP TILEAPPTIME BY ($0,$1);
GROUPVALID = GROUP VALIDAPPTIME BY ($0,$1);
WMSAPPCOUNT = FOREACH GROUPWMS GENERATE FLATTEN(group), COUNT($1) as wmsnum, SUM(WMSAPPTIME.wmsbytes) as wmstotalbytes;
VALIDAPPCOUNT = FOREACH GROUPVALID GENERATE FLATTEN(group), COUNT($1) as validnum, SUM(VALIDAPPTIME.validbytes) as validtotalbytes;
TILEAPPCOUNT = FOREACH GROUPTILE GENERATE FLATTEN(group), COUNT($1) as tilenum, SUM(TILEAPPTIME.tilebytes) as tiletotalbytes:int;
Y = COGROUP VALIDAPPCOUNT BY (validframeday,validframeapp), WMSAPPCOUNT BY (wmsday,wmsapp), TILEAPPCOUNT BY (tileday,tileapp);
Z = FOREACH Y GENERATE group as dailyapp, VALIDAPPCOUNT.validnum, VALIDAPPCOUNT.validtotalbytes, WMSAPPCOUNT.wmsnum, WMSAPPCOUNT.wmstotalbytes, TILEAPPCOUNT.tilenum, TILEAPPCOUNT.tiletotalbytes;
STORE Z into '$OUTPUT';

您能否添加/重写您的问题,以说明您的具体问题是什么?是否使用正则表达式根据调用类型分离数据?此外,您应该删除任何与您的问题不直接相关的代码,阅读所有代码会有点让人不知所措。