Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/82.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/logging/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用R解析Web服务器日志?_R_Logging_Webmin - Fatal编程技术网

如何使用R解析Web服务器日志?

如何使用R解析Web服务器日志?,r,logging,webmin,R,Logging,Webmin,使用R解析这样的日志文件的最佳方法是什么 - - - [20/Nov/2011:01:16:29 +0100] "POST /csw/servlet/cswservlet HTTP/1.1" 200 279 - - - [20/Nov/2011:01:16:29 +0100] "GET /DescargaFenomenos/index.jsp HTTP/1.1" 200 11769 - - - [20/Nov/2011:01:16:29 +0100] "GET /IDEE-ServicesSea

使用R解析这样的日志文件的最佳方法是什么

- - - [20/Nov/2011:01:16:29 +0100] "POST /csw/servlet/cswservlet HTTP/1.1" 200 279
- - - [20/Nov/2011:01:16:29 +0100] "GET /DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
- - - [20/Nov/2011:01:16:29 +0100] "GET /IDEE-ServicesSearch/ServicesSearch.html?locale=es HTTP/1.1" 200 1665
- - - [20/Nov/2011:01:16:29 +0100] "GET /search/indexLayout.jsp?PAGELANGUAGE=es HTTP/1.1" 200 9874
- - - [20/Nov/2011:01:16:29 +0100] "GET /clientesIGN/wmsGenericClient/index.html?lang=ES HTTP/1.1" 200 12058
- - - [20/Nov/2011:01:16:30 +0100] "POST /csw/servlet/cswservlet HTTP/1.1" 200 258038
- - - [20/Nov/2011:01:17:09 +0100] "GET //DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
- - - [20/Nov/2011:01:17:33 +0100] "GET //DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
- - - [20/Nov/2011:01:17:33 +0100] "GET //show.do?to=pideep_pidee.ES HTTP/1.1" 200 26647
192.168.69.10, 62.97.81.202 - - [20/Nov/2011:01:17:34 +0100] "POST /csw/?locale=es HTTP/1.0" 200 2536
192.168.69.10, 62.97.81.202 - - [20/Nov/2011:01:17:34 +0100] "GET /DescargaFenomenos/index.jsp HTTP/1.0" 200 11769
192.168.69.10, 62.97.81.202 - - [20/Nov/2011:01:17:34 +0100] "GET /clientesIGN/wmsGenericClient/index.html?lang=ES HTTP/1.0" 200 12058
- - - [20/Nov/2011:01:17:39 +0100] "GET //csw/servlet/cswservlet?request=GetCapabilities&service=CSW&version=2.0.2 HTTP/1.1" 200 8867
- - - [20/Nov/2011:01:17:46 +0100] "GET //csw/servlet/cswservlet?request=GetCapabilities&service=CSW&version=2.0.2 HTTP/1.1" 200 8867
- - - [20/Nov/2011:01:18:10 +0100] "GET //show.do?to=pideep_pidee.ES HTTP/1.1" 200 26647
- - - [20/Nov/2011:01:19:01 +0100] "GET //DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
我必须考虑边境案件,比如有2个IP在一条线上(内部和外部)。


谢谢

对于本例,用两个NA替换前导破折号,用空格替换逗号就足够了。然后可以使用
read.table()进行解析


datlog一个正则表达式可能可以用来解析它,类似于在perl中所做的。我的问题是,您希望数据最终看起来如何?准备编写bitchy regex。标记表达式和
gsub
是你的朋友。如果你想让生活更轻松,Apache有一种非常灵活的方法来指定日志文件的外观。这种“通用日志”格式令人痛苦,因为其中一半是空格分隔的,另一半是方括号分隔的,另一半是引号,另一半是逗号分隔的。。。这就是不合情理。请参阅以了解如何重新配置日志并使日志正常(假定访问web服务器)。如果查询中有逗号,这会失败吗?我不认为他们必须逃脱。显然,解析日期还有一些工作要做。如果你是指regex模式,那么我认为转义是不必要的,但与许多其他不必要的转义不同,它不会抛出错误。“解析”的请求有点含糊不清。你也可以想象想要从HTML请求中提取信息。不,我指的是GET路径中的逗号。日志格式为:破折号或一个或多个逗号分隔的ip地址,重复三次,方括号中的日期,带引号的请求,状态代码,大小。微小难弄的我懂了。它可能会损坏GET或POST启动序列,但不会因此而失败。您可以构建一个regex模式,在get/POST事件之前将其“隔离”到该部分。如果最多有2个ID,那么您可以使用sub而不是gsub,我将在答案中更改为sub。再次。。。解析需要一些语义的概念,而提问者并没有提供这些概念。
datlog <- readLines(textConnection('- - - [20/Nov/2011:01:16:29 +0100] "POST /csw/servlet/cswservlet HTTP/1.1" 200 279
- - - [20/Nov/2011:01:16:29 +0100] "GET /DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
- - - [20/Nov/2011:01:16:29 +0100] "GET /IDEE-ServicesSearch/ServicesSearch.html?locale=es HTTP/1.1" 200 1665
- - - [20/Nov/2011:01:16:29 +0100] "GET /search/indexLayout.jsp?PAGELANGUAGE=es HTTP/1.1" 200 9874
- - - [20/Nov/2011:01:16:29 +0100] "GET /clientesIGN/wmsGenericClient/index.html?lang=ES HTTP/1.1" 200 12058
- - - [20/Nov/2011:01:16:30 +0100] "POST /csw/servlet/cswservlet HTTP/1.1" 200 258038
- - - [20/Nov/2011:01:17:09 +0100] "GET //DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
- - - [20/Nov/2011:01:17:33 +0100] "GET //DescargaFenomenos/index.jsp HTTP/1.1" 200 11769
- - - [20/Nov/2011:01:17:33 +0100] "GET //show.do?to=pideep_pidee.ES HTTP/1.1" 200 26647
192.168.69.10, 62.97.81.202 - - [20/Nov/2011:01:17:34 +0100] "POST /csw/?locale=es HTTP/1.0" 200 2536
192.168.69.10, 62.97.81.202 - - [20/Nov/2011:01:17:34 +0100] "GET /DescargaFenomenos/index.jsp HTTP/1.0" 200 11769
192.168.69.10, 62.97.81.202 - - [20/Nov/2011:01:17:34 +0100] "GET /clientesIGN/wmsGenericClient/index.html?lang=ES HTTP/1.0" 200 12058
- - - [20/Nov/2011:01:17:39 +0100] "GET //csw/servlet/cswservlet?request=GetCapabilities&service=CSW&version=2.0.2 HTTP/1.1" 200 8867
- - - [20/Nov/2011:01:17:46 +0100] "GET //csw/servlet/cswservlet?request=GetCapabilities&service=CSW&version=2.0.2 HTTP/1.1" 200 8867
- - - [20/Nov/2011:01:18:10 +0100] "GET //show.do?to=pideep_pidee.ES HTTP/1.1" 200 26647
- - - [20/Nov/2011:01:19:01 +0100] "GET //DescargaFenomenos/index.jsp HTTP/1.1" 200 11769'))
 datlog <- gsub("^-", "NA NA", datlog)
 datlog <- sub("\\,", "   ", datlog)
 datlog<-read.table(text=datlog, fill=TRUE)
 datlog
datlog[['dtime']] <- as.POSIXct( paste( sub("\\[", "", datlog[[5]]), 
                                         sub("\\]", "", datlog[[6]]) ),
                                 format="%d/%b/%Y:%H:%M:%S %z")