获取apache日志作为csv文件

获取apache日志作为csv文件,apache,Apache,有没有办法将所有apache日志保存为CSV文件 access.log->access_log.csv error.log->error_log.csv 您可以定义一个直接将Apache日志转换为逗号分隔的格式 为了找到正确的方法,你可能需要花一段时间来处理这个问题。例如,您可能希望使用“或”作为字段分隔符,以防止字段值中的逗号破坏CSV。如果您有问题,希望查看过去写入的日志文件,或来自apache服务器但无法访问配置文件的日志文件,或者,如果您出于其他原因不想更改日志文件格式:

有没有办法将所有apache日志保存为CSV文件

access.log->access_log.csv
error.log->error_log.csv
您可以定义一个直接将Apache日志转换为逗号分隔的格式


为了找到正确的方法,你可能需要花一段时间来处理这个问题。例如,您可能希望使用
作为字段分隔符,以防止字段值中的逗号破坏CSV。

如果您有问题,希望查看过去写入的日志文件,或来自apache服务器但无法访问配置文件的日志文件,或者,如果您出于其他原因不想更改日志文件格式:

我已经编写了一个将默认apache日志文件转换为libre office calc可以读取的格式的程序:

#!/bin/bash

#reformat apache's access logs, so that they can be interpreted as csv files, 
# with space as column delimiter and double quotes to bind together things
# that contain spaces but represent single columns.

# 1)  add a doublequote at the begining of the line. first column is the ip adress. 
#     ip-adresses that have 3 digits in every group but the first could be interpreted as numbers 
#     with the dots marking groups of thousands.

# 2a) end the ip-adress with quotes
# 2b) surround the second (to me unknown) column thats always just "-" and the
#     third column which is the username with quotes
# 2c) reformat the date from "[09/Jul/2012:11:17:47" to "09.Jul 2012 11:17:47"

# 3)  remove the string "+0200]" (replace it with doublequotes to end the date column)

# 4)  the string that contains the command (5th column) sometimes contains string representation 
#     of binary rubish. thats no problem as long as this does not contain a doublequote which 
#     will mess up the column zoning. According to my web searches, csv columns should allow to 
#     contain doublequotes if they are escaped with a backslash. Although this is the case with
#     these problematic strings, Libre Office does not accept it that way. therefore we escape every 
#     doublequote with a doubleqoute, which is the other valid option according to csv specifications,
#     and libre office does accept that one. More technical: we replace every doublequote that does
#     neither have a space or another doublequote before it, neither after it, with two doublequotes.

sed \
-e 's/^/"/' \
-e 's/ \([^ ]\{1,\}\) \([^ ]\{1,\}\) \[\([0-9]\{1,2\}\)\/\([a-zA-Z]\{1,3\}\)\/\([0-9]\{1,4\}\):/" "\1" "\2" "\3.\4 \5 /' \
-e 's/ +0200\] /" /' \
-e 's/\([^" ]\)"\([^" ]\)/\1""\2/g'

这实际上只是@kaefert答案的修改。我相信有一个更干净的方法可以做到这一点,但这非常有效

alias aplogcsv="sed -e 's/^/\"/' \
                -e 's/:\([0-9]\{1,3\}\.\)\([0-9]\{1,3\}\.\)\([0-9]\{1,3\}\.\)\([0-9]\{1,3\}\)/\",\"\1\2\3\4/' \
                -e 's/ \([^ ]\{1,\}\) \([^ ]\{1,\}\) \[\([0-9]\{1,2\}\)\/\([a-zA-Z]\{1,3\}\)\/\([0-9]\{1,4\}\):/\",\"\1\" \"\2\" \"\3 \4 \5\",\" /' \
                -e 's/ \([0-9]\{1,2\}\):\([0-9]\{1,2\}\):\([0-9]\{1,2\}\)/\1:\2:\3/' \
                -e 's/ -0700\] /\",/' \
                -e 's/\"GET /\"GET\",\"/g' \
                -e 's/\"POST /\"POST\",\"/g' \
                -e 's/ HTTP\/1.1\" \([0-9]\{1,3\}\) \([0-9]\{1,4\}\) /\",\"HTTP\/1.1\",\1,\2,/' \
                -e 's/\"-\" //g'"
然后我就这样使用它:

aplogcsv access.log > ~/access.log.csv
grep "25/Jan/2019" access.log | aplogcsv > ~/20190125.access.log.csv
但它也很容易像这样使用:

aplogcsv access.log > ~/access.log.csv
grep "25/Jan/2019" access.log | aplogcsv > ~/20190125.access.log.csv

不太完美(它没有在字段之间添加逗号),但这让我很接近。好吧,OpenOffice可以使用任何你喜欢的分隔符,我在这里使用了一个空格。有些csv文件有逗号(,),有些有分号(;),这不是标准化的。你有没有处理默认apache error.log文件的版本?没有,我从来都不需要它。但在我看来,它的格式要简单得多,应该很容易做到。