第一列中有空格的awk输出

第一列中有空格的awk输出,awk,Awk,我尝试使用awk拆分列来打印句子,但第一列有空格 我的初学者代码示例: $ awk '/Linux/ { print "The filename","\""$1"\"","is located in",$2 }' test.txt The filename "The" is located in test The filename "Some" is loc

我尝试使用awk拆分列来打印句子,但第一列有空格

我的初学者代码示例:

$ awk '/Linux/ { print "The filename","\""$1"\"","is located in",$2 }' test.txt
The filename "The" is located in test
The filename "Some" is located in file
The filename "File" is located in name
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label" is located in test
来自文件:test.txt

Filename                               Folder         Type
-------------------------------------- -------------- ------
The test file                          /test/folder   Linux
Some file                              /              Linux
File name                              /Temp          Linux
Something_here                         /ABC           Linux
Another_test                           /DEFG          Linux
Label test                             /HIJK          Linux 
我想要达到的目标:(包括引号)

问题是,当我使用“space”或“/”作为分隔符时,我无法在打印时获得整行内容

我建议使用基于正则表达式和反向引用的替换,再加上一个消除源文件标题行的命令:

$ cat test.txt | grep -E 'Linux[ ]*$' | sed -E 's%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%'
The filename "The test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK
正则表达式(regex)的一个很好的参考是

评论中要求的详细说明:

  • 带有-E选项的grep接受扩展正则表达式(上面的参考文档)。在这里,它用于过滤包含“Linux”单词的行,每行末尾都有一些空格(如果有的话)
  • grep的输出进入sed的输入
  • sed通过-E选项,如grep接受扩展正则表达式。s命令将与正则表达式匹配的字符(第一部分在%chars=“(.+)([^])([]+)(/.+)[]+Linux[]*$)之间)替换为其他字符(第二部分在%chars=“文件名”\1\2”位于\4中)
  • 第二部分使用反向引用:“\”后跟非零十进制数字n,由正则表达式的nth括号子表达式替换。此处,\1被与第一个“(.+)”匹配的字符串替换,该字符串是此处的文件名,\2被以下“([^])”替换,该字符串是文件名的最后一个字符(从名称中隐藏以下空格的技巧)
这不是一个严格的解释,但至少它提供了一些进一步的输入

另一种解决方案是在sed命令行上传递多个操作。因此,您可以添加一个查询来删除前两个标题行,以使用catgrep抑制管道。此处“1,2d”表示“删除第1行和第2行”:

注:根据,选项-E切换到使用扩展正则表达式。GNUsed多年来一直支持它,现在已包含在POSIX中。 在较旧的系统上,如果不支持-E,则可以使用-r

$ sed -r '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The  test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK
我建议使用基于正则表达式和反向引用的替换加上一个命令来消除源文件的头行:

$ cat test.txt | grep -E 'Linux[ ]*$' | sed -E 's%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%'
The filename "The test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK
正则表达式(regex)的一个很好的参考是

评论中要求的详细说明:

  • grep-E选项接受扩展正则表达式(上面的参考文档)。在这里,它用于过滤包含“Linux”单词的行,如果每行末尾有空格的话
  • grep的输出进入sed的输入
  • sed被传递-E选项,如grep以接受扩展正则表达式。s命令替换与正则表达式匹配的字符(第一部分在%chars=“(.+)([^])([]+)(/.+)[]+Linux[]*$)之间,第二部分在%chars=“文件名”\1\2“位于\4”之间)
  • 第二部分使用反向引用:“\”后跟一个非零十进制数字n被正则表达式的nth括号中的子表达式替换。这里,\1被匹配第一个“(.+)”的字符串替换,这是这里的文件名,\2被以下“([^]”替换“这是文件名的最后一个字符(从名称中取消以下空白的技巧)
这不是一个严格的解释,但至少它提供了一些进一步的输入

另一种解决方案是在sed命令行上传递多个操作。因此,您可以添加一个查询来删除前两个标题行,以使用catgrep抑制管道。此处“1,2d”表示“删除第1行和第2行”:

注:根据,选项-E切换到使用扩展正则表达式。GNUsed多年来一直支持它,现在已包含在POSIX中。 在较旧的系统上,如果不支持-E,则可以使用-r

$ sed -r '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The  test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

GNU awk有正则表达式字段分隔符,所以只需要多个空格分隔列

awk '/Linux/ { print "The file \""$1"\" is in "$2"." }' FS="   *" test.txt

它还提供固定宽度的字段,例如
info gawk fieldwidths
,您可以使用虚线的长度动态设置这些字段。

GNU awk具有正则表达式字段分隔符,因此只需要多个空格分隔列即可

awk '/Linux/ { print "The file \""$1"\" is in "$2"." }' FS="   *" test.txt

它还提供固定宽度的字段,例如
info gawk fieldwidths
,您可以使用虚线的长度动态设置这些字段。

如果您有GNU AWK,这应该可以做到:

awk 'match($0, /([^\/]+)([^ ]+) *Linux/, arr) { sub(/ +$/, "", arr[1]); printf("The filename \"%s\" is located in %s\n", arr[1], arr[2]) }' test.txt
说明:


# match and store groups in 'arr'
#  - arr[1]: everything up until the first slash (including a lot of whitespace)
#  - arr[2]: first slash until space
#  - rest: also ensure there's 'Linux' after that
match($0, /([^\/]+)([^ ]+) *Linux/, arr) {

  # trim whitespace from the right hand side of the filename
  sub(/ +$/, "", arr[1]);

  # print
  printf("The filename \"%s\" is located in %s\n", arr[1], arr[2])
}

请注意,在其他版本的AWK中也有一个功能较弱的版本
match
,同样的功能也可以通过这些版本实现,但您必须编写更多的代码。

如果您有GNU AWK,这应该可以做到:

awk 'match($0, /([^\/]+)([^ ]+) *Linux/, arr) { sub(/ +$/, "", arr[1]); printf("The filename \"%s\" is located in %s\n", arr[1], arr[2]) }' test.txt
说明:


# match and store groups in 'arr'
#  - arr[1]: everything up until the first slash (including a lot of whitespace)
#  - arr[2]: first slash until space
#  - rest: also ensure there's 'Linux' after that
match($0, /([^\/]+)([^ ]+) *Linux/, arr) {

  # trim whitespace from the right hand side of the filename
  sub(/ +$/, "", arr[1]);

  # print
  printf("The filename \"%s\" is located in %s\n", arr[1], arr[2])
}

请注意,在其他版本的AWK中也有一个功能较弱的
match
,同样的功能也可以用它来实现,但是你需要编写更多的代码。

请提供你正在使用的脚本或命令。还有,你现在得到的输出。@KimochiIku你可以编辑你的文章。请移动你的同事的内容mment to your post。请提供您正在使用的脚本或命令。还有,您现在获得的输出。@KimochiIku您可以编辑您的帖子。请将您的评论内容移动到您的帖子中。您是否可以将您的命令分解为什么/如何工作的?awk在阅读此作品时是否也使用?我更新了