Awk 拆分字段,然后删除重复项
示例文件:Awk 拆分字段,然后删除重复项,awk,Awk,示例文件: # cat test1 -rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log -rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log -rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out -rw-r--r-- 1 root root 0 Feb 10
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
预期输出:
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
尝试1(有效):
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
尝试2(有效):
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
尝试3(失败):
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
问题:
# cat test1
-rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root 0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root 2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root 8323 May 4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root 873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root 458600 May 4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root 0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root 0 May 4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
我想拆分第9个字段,然后只显示uniq条目。然而,我想在一个
awk
1行程序中实现这一点。在我第三次尝试时寻求帮助 您必须使用END
块打印结果:
awk '{split($NF,a,"."); b[a[1]]} END{for (i in b){print i}}' file
注:
- 我正在使用
捕获最后一个字段。这样,如果您恰好有多于或少于9的字段,它也可以工作(只要没有带空格的文件名,因为)$NF
- 我们不能直接通过
数组循环,因为它是包含拆分数据的数组。为此,我们需要创建另一个数组,例如a[]
。这就是为什么我们说b[]
。仅此一项,不需要b[a[1]]
,除非您希望跟踪任何项目出现的次数b[a[1]]+
块在处理整个文件后执行。否则,您将按每条记录检查一次结果(即,每行一次),随后会出现重复的结果END
END
块打印结果:
awk '{split($NF,a,"."); b[a[1]]} END{for (i in b){print i}}' file
注:
- 我正在使用
捕获最后一个字段。这样,如果您恰好有多于或少于9的字段,它也可以工作(只要没有带空格的文件名,因为)$NF
- 我们不能直接通过
数组循环,因为它是包含拆分数据的数组。为此,我们需要创建另一个数组,例如a[]
。这就是为什么我们说b[]
。仅此一项,不需要b[a[1]]
,除非您希望跟踪任何项目出现的次数b[a[1]]+
块在处理整个文件后执行。否则,您将按每条记录检查一次结果(即,每行一次),随后会出现重复的结果END
awk
一行:
awk '!a[ $0 = substr($NF,1,index($NF,".")-1) ]++' file
或者,更明确地表达:
awk '{$0=substr($NF,1,index($NF,".")-1)} !a[$0]++' file
- 我们使用
行重复数据消除技巧!一个[$0]+
- 但首先我们将
更改为:$0
substr($NF,1,index($NF,“.”)1)
- 整行变为最后一个字段的子字符串
,直到第一个点($NF
)–使用
和substr()
index()
- 整行变为最后一个字段的子字符串
此解决方案的一个好处是,您不需要等待整个文件被解析。拆分字段会被动态地消除重复并打印。另一个更惯用的
awk
一行:
awk '!a[ $0 = substr($NF,1,index($NF,".")-1) ]++' file
或者,更明确地表达:
awk '{$0=substr($NF,1,index($NF,".")-1)} !a[$0]++' file
- 我们使用
行重复数据消除技巧!一个[$0]+
- 但首先我们将
更改为:$0
substr($NF,1,index($NF,“.”)1)
- 整行变为最后一个字段的子字符串
,直到第一个点($NF
)–使用
和substr()
index()
- 整行变为最后一个字段的子字符串