如何解析Apache访问日志中的每个IP,并在bash脚本的CSV文件中统计来自它们的每个唯一请求?

如何解析Apache访问日志中的每个IP,并在bash脚本的CSV文件中统计来自它们的每个唯一请求?,bash,shell,csv,unix,awk,Bash,Shell,Csv,Unix,Awk,我一直在尝试创建一个bash脚本,用Apache访问日志中的每个IP创建一个CSV文件,并计算IP发出的唯一请求以及实际请求的数量 到目前为止,我有: #!/bin/bash # Print the headers to the CSV file printf "\tRequests\tIP\t\n" > memory.csv # Create a text file named .access_log.tmp.2 with the IPs and how many requests

我一直在尝试创建一个bash脚本,用Apache访问日志中的每个IP创建一个CSV文件,并计算IP发出的唯一请求以及实际请求的数量

到目前为止,我有:

#!/bin/bash

# Print the headers to the CSV file
printf "\tRequests\tIP\t\n" > memory.csv

# Create a text file named .access_log.tmp.2 with the IPs and how many requests they made in total - .access.log.tmp is the Apache access log in this case
awk '{ print $1 }' .access_log.tmp | sort -n | uniq -c | sort -nr | head -20 > ".access_log.tmp.2"

# Make it a CSV file
sed 's/[[:space:]]\+/;/g' .access_log.tmp.2 >> memory.csv

# Remove the leftover files
rm .access_log.tmp .access_log.tmp.2
这将产生如下输出:
Requests          IP
20                10.0.0.1
15                10.0.0.2
这就是我希望它看起来的样子:

IP              Requests
10.0.0.1        12 "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                8 "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"

10.0.0.2        13 "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                2 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
etc.
我不知道从现在起该去哪里。
有人能帮忙吗?

编辑:根据要求添加以下输入和输出文件: 我现在拥有的

10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:47 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
^输入

Requests            IP
9                   10.0.0.6
5                   10.0.0.7
^输出

我想要的

输入是相同的


^输出两种可能的路径:单个
awk
程序或组合排序/uniq/awk管道。第二个更容易写:

  • 消除不需要的属性(时间戳和2个“-”字段)
  • 按IP排序,请求信息
  • 计数唯一IP/请求信息
  • 按递减请求计数对行进行排序
  • 用awk格式化输出

  • 下面是另一个缩短的
    awk
    解决方案(标准Linux
    gawk

    一次文件扫描,一次排序,无字符串替换,减少到只有3个字段

    script.awk 输入文件 运行: 输出:
    请提供示例输入文件,以及相应的输出。已添加@杜迪·博伊工作得很有魅力!非常感谢。:)输出中的轻微不匹配:顺序与OP样本输出要求相反(10.0.0.7出现在10.0.0.6之前)
    IP                            Requests
    10.0.0.6                      3 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                                  3 "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                                  3 "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    
    10.0.0.7                      1 "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                                  1 "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                                  1 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                                  1 "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                                  1 "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    
    cat input |
        awk '{ $2 = $3 = $4 = $5 = "" ; print }' |
        sort |
        uniq -c |
        sort -k2.2nr -k1.1 |
        awk '
    {
        printf "%-20s %d", $2 != p ? $2 : "", $1 ;
        p=$2 ; for (i=3 ; i<=NF ; i++) printf " %s", $i ;
        printf "\n"
    }'
    
    #! /usr/bin/awk -f
    {
            ip = $1
            body = $6
            for (i=7 ; i<=NF ; i++) body = body " " $i
            n[ip, body]++
    }
    
    function sort_id_count(i1, v1, i2, v2)
    {
            ip1 = substr(v1, 1, index(v1, SUBSEP))
            ip2 = substr(v2, 1, index(v2, SUBSEP))
    
            if ( ip1 < ip2 ) return -1
            if ( ip1 > ip2 ) return +1 ;
    
            # Descending freq
            return n[v2]-n[v1]
    }
    
    BEGIN { OFS="," }
    
    END {
            na=0
            for (k in n) a[++na] = k ;
            asorti(a, ai, "sort_id_count") ;
            p="" ;
            for (ki in ai) {
                    k1 = ai[ki]
                    k2 = a[k1]
                    ip = substr(k2, 1, index(k2, SUBSEP)-1)
                    body = substr(k2, index(k2, SUBSEP)+1)
                    if ( ip == p ) ip = "" ; else p=ip ;
                    printf "%-20s %d %s\n", ip, n[k2], body
            }
    }
    
    BEGIN {FS="( -)|(] \")"} # define field separator " -" or "] "
    { # read each input line
        ipLogsArr[$1,$4]++; # store array counting appearance IP+Log combination
        ipArr[$1]++; # store array counting appearance of IP
        ipLogsArrVal[$1,$4]=sprintf("%s&&&%03d&&&%s", $1, ipLogsArr[$1,$4], $4); # store array of IP+count+Log combination
    }
    END { # post processing after reading all input
        printf("%-14s %3s %s\n", "IP", "#", "log"); # output header
        count = asort(ipLogsArrVal); # sort array of IP+count+Log combination
        for (i = count; i >= 1; i--) { # for each element of the sorted array, iterate backward
            split(ipLogsArrVal[i],arr,"&&&"); # separate IP+count+Log to into array arr
            ipOut = (currIp == arr[1]) ? "" : arr[1]; # ignore printed IP
            printf("%-14s %3d %s\n", ipOut, arr[2], arr[3]); # print current log
            currIp = arr[1]; # remember current IP, in order to prevent repeated output
        }
    }
    
    10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:47 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
    
    awk -f script.awk output.txt
    
    IP               # log
    10.0.0.7         1 GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                     1 GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                     1 GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                     1 GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                     1 GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
    10.0.0.6         3 GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                     3 GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                     2 GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"