Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Linux Bash脚本以过滤掉日志中不相邻的重复项_Linux_Bash_Logging_Awk_Scripting - Fatal编程技术网

Linux Bash脚本以过滤掉日志中不相邻的重复项

Linux Bash脚本以过滤掉日志中不相邻的重复项,linux,bash,logging,awk,scripting,Linux,Bash,Logging,Awk,Scripting,我正在尝试创建一个脚本来过滤日志中的重复项,并保留每条消息的最新信息。下面是一个样本 May 29 22:25:19 servername.com Fdm: this is error message 1 error code=0x98765 May 29 22:25:19 servername.com Fdm: this is just a message May 29 22:25:19 servername.com Fdm: error code=12345 message 2 May 29

我正在尝试创建一个脚本来过滤日志中的重复项,并保留每条消息的最新信息。下面是一个样本

May 29 22:25:19 servername.com Fdm: this is error message 1 error code=0x98765
May 29 22:25:19 servername.com Fdm: this is just a message
May 29 22:25:19 servername.com Fdm: error code=12345 message 2
May 29 22:25:20 servername.com Vpxa: this is error message 1 error code=0x67890
May 29 22:25:20 servername.com Vpxa: just another message
May 29 22:25:30 servername.com Fdm: error code=34567 message 2
May 29 22:25:30 servername.com Fdm: another error message 3 76543
日志被分成两个文件,我已经开始创建脚本来合并这两个文件,并使用sort-s-r-k1按日期对文件进行排序

我还成功地创建了脚本,以便它请求我想要的日期,然后使用grep按日期过滤掉

现在,我只需要找到一种方法来删除非相邻的重复行,这些重复行也有不同的时间戳。我试过awk,但我对awk的了解不是很好。有谁能帮助我

另外,我遇到的一个问题是,有相同的行具有不同的错误代码,我想删除这些行,但是,我只能通过grep-v“行的常量部分”这样做。如果有一种方法可以让我按照相似性的百分比来删除重复项,那就太好了。此外,我无法让脚本忽略某些字段或列,因为在不同的字段/列中有带有错误代码的行

预期产出如下

May 29 22:25:30 servername.com Fdm: another error message 3 76543
May 29 22:25:30 servername.com Fdm: error code=34567 message 2
May 29 22:25:20 servername.com Vpxa: this is error message 1 error code=0x67890

我只想要错误,但是,这很容易用grep-I错误来完成。唯一的问题是具有不同错误代码的重复行。

要删除具有不同时间戳的相同行,只需检查第15个字符后的重复行即可

awk '!duplicates[substr($0,15)]++' $filename

如果日志是以制表符分隔的,则可以更精确地选择要从中确定重复项的列,这是一个比试图在不同文件之间找到Levenshtein距离更好的解决方案。

您可以单独使用
排序来实现这一点

只需从4号开始对字段进行操作,即可获得副本:

sort -uk4 file.txt
这将为您提供来自DUPS的第一个条目;如果您想要最后一个,请事先使用
tac

tac file.txt | sort -uk4 
示例:

$ cat file.txt      
May 29 22:25:19 servername.com Fdm: [FF93DB90 verbose 'Cluster' opID=SWI-56f32f43] Updating inventory manager with 1 datastores
May 29 22:25:19 servername.com Fdm: [FF93DB90 verbose 'Invt' opID=SWI-56f32f43] [InventoryManagerImpl::UpdateDatastoreLockStatus] Lock state change to 4 for datastore /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050
May 29 22:25:19 servername.com Fdm: [FFB03B90 verbose 'Invt' opID=SWI-65391264] [DsStateChange::SaveToInventory] Processing locked error update for /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050 (<unset>) from __localhost__
May 29 22:25:20 servername.com Vpxa: [FFF3AB90 verbose 'vpxavpxaMoVm' opID=SWI-54ad408b] [VpxaMoVm::CheckMoVm] did not find a VM with ID 17 in the vmList
May 21 12:05:02 servername.com Fdm: [FF93DB90 verbose 'Invt' opID=SWI-56f32f43] [InventoryManagerImpl::UpdateDatastoreLockStatus] Lock state change to 4 for datastore /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050
May 29 22:25:20 servername.com Vpxa: [FFF3AB90 verbose 'vpxavpxaAlarm' opID=SWI-54ad408b] [VpxaAlarm] VM with vmid = 17 not found
May 30 07:50:07 servername.com Fdm: [FF93DB90 verbose 'Cluster' opID=SWI-56f32f43] Updating inventory manager with 1 datastores

$ sort -uk4 file.txt
May 29 22:25:19 servername.com Fdm: [FF93DB90 verbose 'Cluster' opID=SWI-56f32f43] Updating inventory manager with 1 datastores
May 29 22:25:19 servername.com Fdm: [FF93DB90 verbose 'Invt' opID=SWI-56f32f43] [InventoryManagerImpl::UpdateDatastoreLockStatus] Lock state change to 4 for datastore /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050
May 29 22:25:19 servername.com Fdm: [FFB03B90 verbose 'Invt' opID=SWI-65391264] [DsStateChange::SaveToInventory] Processing locked error update for /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050 (<unset>) from __localhost__
May 29 22:25:20 servername.com Vpxa: [FFF3AB90 verbose 'vpxavpxaAlarm' opID=SWI-54ad408b] [VpxaAlarm] VM with vmid = 17 not found
May 29 22:25:20 servername.com Vpxa: [FFF3AB90 verbose 'vpxavpxaMoVm' opID=SWI-54ad408b] [VpxaMoVm::CheckMoVm] did not find a VM with ID 17 in the vmList

$ tac file.txt | sort -uk4         
May 30 07:50:07 servername.com Fdm: [FF93DB90 verbose 'Cluster' opID=SWI-56f32f43] Updating inventory manager with 1 datastores
May 21 12:05:02 servername.com Fdm: [FF93DB90 verbose 'Invt' opID=SWI-56f32f43] [InventoryManagerImpl::UpdateDatastoreLockStatus] Lock state change to 4 for datastore /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050
May 29 22:25:19 servername.com Fdm: [FFB03B90 verbose 'Invt' opID=SWI-65391264] [DsStateChange::SaveToInventory] Processing locked error update for /vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050 (<unset>) from __localhost__
May 29 22:25:20 servername.com Vpxa: [FFF3AB90 verbose 'vpxavpxaAlarm' opID=SWI-54ad408b] [VpxaAlarm] VM with vmid = 17 not found
May 29 22:25:20 servername.com Vpxa: [FFF3AB90 verbose 'vpxavpxaMoVm' opID=SWI-54ad408b] [VpxaMoVm::CheckMoVm] did not find a VM with ID 17 in the vmList
$cat file.txt
5月29日22:25:19 servername.com Fdm:[FF93DB90详细“群集”opID=SWI-56f32f43]正在使用1个数据存储更新inventory manager
5月29日22:25:19 servername.com Fdm:[FF93DB90详细'Invt'opID=SWI-56f32f43][InventoryManagerImpl::UpdateDataStore/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050的锁定状态更改为4
5月29日22:25:19 servername.com Fdm:[FFB03B90详细'Invt'opID=SWI-65391264][DsStateChange::SaveToInventory]处理来自u localhost的/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050()的锁定错误更新__
5月29日22:25:20 servername.com Vpxa:[FFF3AB90详细'vpxavpxaMoVm'opID=SWI-54ad408b][VpxaMoVm::CheckMoVm]在vmList中未找到ID为17的VM
5月21日12:05:02 servername.com Fdm:[FF93DB90详细'Invt'opID=SWI-56f32f43][InventoryManagerImpl::UpdateDataStore/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050的锁定状态更改为4
5月29日22:25:20 servername.com Vpxa:[FFF3AB90详细的'VPXAVPXAlarm'opID=SWI-54ad408b][VPXAlarm]找不到vmid=17的虚拟机
5月30日07:50:07 servername.com Fdm:[FF93DB90详细“群集”opID=SWI-56f32f43]正在使用1个数据存储更新inventory manager
$sort-uk4 file.txt
5月29日22:25:19 servername.com Fdm:[FF93DB90详细“群集”opID=SWI-56f32f43]正在使用1个数据存储更新inventory manager
5月29日22:25:19 servername.com Fdm:[FF93DB90详细'Invt'opID=SWI-56f32f43][InventoryManagerImpl::UpdateDataStore/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050的锁定状态更改为4
5月29日22:25:19 servername.com Fdm:[FFB03B90详细'Invt'opID=SWI-65391264][DsStateChange::SaveToInventory]处理来自u localhost的/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050()的锁定错误更新__
5月29日22:25:20 servername.com Vpxa:[FFF3AB90详细的'VPXAVPXAlarm'opID=SWI-54ad408b][VPXAlarm]找不到vmid=17的虚拟机
5月29日22:25:20 servername.com Vpxa:[FFF3AB90详细'vpxavpxaMoVm'opID=SWI-54ad408b][VpxaMoVm::CheckMoVm]在vmList中未找到ID为17的VM
$tac file.txt |排序-uk4
5月30日07:50:07 servername.com Fdm:[FF93DB90详细“群集”opID=SWI-56f32f43]正在使用1个数据存储更新inventory manager
5月21日12:05:02 servername.com Fdm:[FF93DB90详细'Invt'opID=SWI-56f32f43][InventoryManagerImpl::UpdateDataStore/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050的锁定状态更改为4
5月29日22:25:19 servername.com Fdm:[FFB03B90详细'Invt'opID=SWI-65391264][DsStateChange::SaveToInventory]处理来自u localhost的/vmfs/volumes/531b5d83-9129a42b-f3f8-001e6849b050()的锁定错误更新__
5月29日22:25:20 servername.com Vpxa:[FFF3AB90详细的'VPXAVPXAlarm'opID=SWI-54ad408b][VPXAlarm]找不到vmid=17的虚拟机
5月29日22:25:20 servername.com Vpxa:[FFF3AB90详细'vpxavpxaMoVm'opID=SWI-54ad408b][VpxaMoVm::CheckMoVm]在vmList中未找到ID为17的VM

您可以使用
排序-suk4
跳过前3个字段并删除重复项。前3个字段将是日期字符串,因此之后具有相同文本的任何两行都将被删除。然后,您可以根据需要对输出字段进行排序

sort -suk4 filename | sort -rs
除去带有不同错误代码的行会更为棘手,但我建议将带有错误代码的行隔离到它们自己的文件中,然后使用如下方法:

sed 's/\(.*error code=\)\([0-9]*\)/\2 \1/' errorfile | sort -suk5 | sed 's/\([0-9]*\) \(.*error code=\)/\2\1/'

您不会告诉我们如何定义“复制”,但如果您指的是同一天的邮件,则这将起作用:

$ tac file | awk '!seen[$1,$2,$3]++' | tac
May 29 22:25:19 servername.com Fdm: error code=12345 message 2
May 29 22:25:20 servername.com Vpxa: just another message
May 29 22:25:30 servername.com Fdm: another error message 3 76543

如果这不是你的意思,那么只需改变在AWK数组中使用的索引就是你想考虑的任何重复测试。

鉴于你最近的评论,也许这就是你想要的:

$ tac file | awk '!/error/{next} {k=$0; sub(/([^:]+:){3}/,"",k); gsub(/[0-9]+/,"#",k)} !seen[k]++' | tac
May 29 22:25:20 servername.com Vpxa: this is error message 1 error code=0x67890
May 29 22:25:30 servername.com Fdm: error code=34567 message 2
May 29 22:25:30 servername.com Fdm: another error message 3 76543
上面的方法是创建键值k,即第一个
之后的部分,它不是时间字段的一部分,所有数字序列都更改为


我设法找到了一个办法。只是想给你们更多关于我的问题的细节,还有什么
$ awk '!/error/{next} {k=$0; sub(/([^:]+:){3}/,"",k); gsub(/[0-9]+/,"#",k); print $0 ORS "\t -> key =", k}' file
May 29 22:25:19 servername.com Fdm: this is error message 1 error code=0x98765
         -> key =  this is error message # error code=#x#
May 29 22:25:19 servername.com Fdm: error code=12345 message 2
         -> key =  error code=# message #
May 29 22:25:20 servername.com Vpxa: this is error message 1 error code=0x67890
         -> key =  this is error message # error code=#x#
May 29 22:25:30 servername.com Fdm: error code=34567 message 2
         -> key =  error code=# message #
May 29 22:25:30 servername.com Fdm: another error message 3 76543
         -> key =  another error message # #
#!/usr/local/bin/bash

rm /tmp/tmp.log /tmp/tmpfiltered.log 2> /dev/null

printf "Please key in full location of logs: "

read log1loc log2loc

cat $log1loc $log2loc >> /tmp/tmp.log

sort -s -r -k1 /tmp/tmp.log -o /tmp/tmp.log

printf "Please key in the date: "

read logdate

while [[ $firstlineedit != "n" ]]

        do

        grep -e "$logdate" /tmp/tmp.log | grep -i error | less

        firstline=$(head -n 1 /tmp/tmp.log)

        head -n 1 /tmp/tmp.log >> /tmp/tmpfiltered.log

        read -p "Enter line to remove(enter n to quit): " -e -i "$firstline" firstlineedit

        firstlinecount=$(grep -e "$logdate" /tmp/tmp.log | grep -i error | grep -o "$firstlineedit" | wc -l)

        grep -e "$logdate" /tmp/tmp.log | grep -i error | grep -v "$firstlineedit" > /tmp/tmp2.log

        mv /tmp/tmp2.log /tmp/tmp.log

        if [ "$firstlineedit" != "n" ];

                then

                echo That line and it"'"s variations have appeared $firstlinecount times in the log!

        fi
done

cat /tmp/tmpfiltered.log | less