Awk 拆分字段,然后删除重复项

Awk 拆分字段,然后删除重复项,awk,Awk,示例文件: # cat test1 -rw-r--r-- 1 root root 19460 Feb 10 03:56 catalina.2015-02-10.log -rw-r--r-- 1 root root 206868 May 4 15:05 catalina.2015-05-04.log -rw-r--r-- 1 root root 922121 Jun 24 09:26 catalina.out -rw-r--r-- 1 root root 0 Feb 10

示例文件:

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
预期输出:

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
尝试1(有效):

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
尝试2(有效):

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
尝试3(失败):

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.
问题:

# cat test1 
-rw-r--r-- 1 root root   19460 Feb 10 03:56 catalina.2015-02-10.log
-rw-r--r-- 1 root root  206868 May  4 15:05 catalina.2015-05-04.log
-rw-r--r-- 1 root root  922121 Jun 24 09:26 catalina.out
-rw-r--r-- 1 root root       0 Feb 10 02:27 host-manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 host-manager.2015-05-04.log
-rw-r--r-- 1 root root    2025 Feb 10 03:56 localhost.2015-02-10.log
-rw-r--r-- 1 root root    8323 May  4 15:05 localhost.2015-05-04.log
-rw-r--r-- 1 root root     873 Feb 10 03:56 localhost_access_log.2015-02-10.txt
-rw-r--r-- 1 root root  458600 May  4 23:59 localhost_access_log.2015-05-04.txt
-rw-r--r-- 1 root root       0 Feb 10 02:27 manager.2015-02-10.log
-rw-r--r-- 1 root root       0 May  4 04:17 manager.2015-05-04.log
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | awk '!z[$i]++'
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); print a[1]}' test1 | uniq
catalina
host-manager
localhost
localhost_access_log
manager
# awk '{split($9,a,"."); a[1]++} {for (i in a){print a[i]}}' test1
1
2015-02-10
log
1
2015-05-04
log
1
out
.
.
.

我想拆分第9个字段,然后只显示uniq条目。然而,我想在一个
awk
1行程序中实现这一点。在我第三次尝试时寻求帮助

您必须使用
END
块打印结果:

awk '{split($NF,a,"."); b[a[1]]} END{for (i in b){print i}}' file
注:

  • 我正在使用
    $NF
    捕获最后一个字段。这样,如果您恰好有多于或少于9的字段,它也可以工作(只要没有带空格的文件名,因为)
  • 我们不能直接通过
    a[]
    数组循环,因为它是包含拆分数据的数组。为此,我们需要创建另一个数组,例如
    b[]
    。这就是为什么我们说
    b[a[1]]
    。仅此一项,不需要
    b[a[1]]+
    ,除非您希望跟踪任何项目出现的次数
  • END
    块在处理整个文件后执行。否则,您将按每条记录检查一次结果(即,每行一次),随后会出现重复的结果

您必须使用
END
块打印结果:

awk '{split($NF,a,"."); b[a[1]]} END{for (i in b){print i}}' file
注:

  • 我正在使用
    $NF
    捕获最后一个字段。这样,如果您恰好有多于或少于9的字段,它也可以工作(只要没有带空格的文件名,因为)
  • 我们不能直接通过
    a[]
    数组循环,因为它是包含拆分数据的数组。为此,我们需要创建另一个数组,例如
    b[]
    。这就是为什么我们说
    b[a[1]]
    。仅此一项,不需要
    b[a[1]]+
    ,除非您希望跟踪任何项目出现的次数
  • END
    块在处理整个文件后执行。否则,您将按每条记录检查一次结果(即,每行一次),随后会出现重复的结果

另一个更惯用的
awk
一行:

awk '!a[ $0 = substr($NF,1,index($NF,".")-1) ]++' file
或者,更明确地表达:

awk '{$0=substr($NF,1,index($NF,".")-1)} !a[$0]++' file
  • 我们使用
    !一个[$0]+
    行重复数据消除技巧
  • 但首先我们将
    $0
    更改为:
    substr($NF,1,index($NF,“.”)1)
    • 整行变为最后一个字段的子字符串
      $NF
      ,直到第一个点(
      )–使用
      substr()
      index()

此解决方案的一个好处是,您不需要等待整个文件被解析。拆分字段会被动态地消除重复并打印。

另一个更惯用的
awk
一行:

awk '!a[ $0 = substr($NF,1,index($NF,".")-1) ]++' file
或者,更明确地表达:

awk '{$0=substr($NF,1,index($NF,".")-1)} !a[$0]++' file
  • 我们使用
    !一个[$0]+
    行重复数据消除技巧
  • 但首先我们将
    $0
    更改为:
    substr($NF,1,index($NF,“.”)1)
    • 整行变为最后一个字段的子字符串
      $NF
      ,直到第一个点(
      )–使用
      substr()
      index()
此解决方案的一个好处是,您不需要等待整个文件被解析。分割的字段将消除重复并实时打印