Bash I'；我对awk、sed等感到困惑_Bash_Sed_Awk

Bash I'；我对awk、sed等感到困惑

bash sed awk

Bash I'；我对awk、sed等感到困惑,bash,sed,awk,Bash,Sed,Awk,我正在努力解决这个问题，但到目前为止还没有成功我有一个命令输出，我需要咀嚼，使其适合进一步处理我的案文是： 1/2 [3] (27/03/2012 19:32:54) word word word word 4/5 我需要的是只提取数字1/2[3]4/5，这样看起来： 1 2 3 4 5 所以，基本上我试图排除所有不是数字的字符，比如“/”、“[”、“]”等等。我用FS尝试了awk，尝试了使用regexp，但没有一次成功然后我会给它添加一些东西，比如第一：1秒：2第三：3。。。。等

我正在努力解决这个问题，但到目前为止还没有成功我有一个命令输出，我需要咀嚼，使其适合进一步处理

我的案文是：

1/2 [3] (27/03/2012 19:32:54) word word word word 4/5

我需要的是只提取数字1/2[3]4/5，这样看起来：

1 2 3 4 5

所以，基本上我试图排除所有不是数字的字符，比如“/”、“[”、“]”等等。我用FS尝试了awk，尝试了使用regexp，但没有一次成功

然后我会给它添加一些东西，比如第一：1秒：2第三：3。。。。等请记住，我说的是一个包含很多相同结构的if行的文件，但我已经考虑过使用awk对每一列求和

awk '{sum1+=$1 ; sum2+=$2 ;......etc} END {print "first:"sum1 " second:"sum2.....etc}'

但首先我只需要提取相关的数字，介于“（）”之间的日期可以完全省略，但它们也是数字，因此仅按数字过滤是不够的，因为它也会匹配它们

希望你能帮助我

提前谢谢

您可以执行类似于sed-e's/（.*）/'-e's/[^0-9]///g'的操作。它删除圆括号内的所有内容，然后用空格替换所有非数字字符。要消除多余的空格，可以将其输入到

列-t

：

$ echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' | sed -e 's/(.*)//' -e 's/[^0-9]/ /g' | column -t
1  2  3  4  5

如果这是您想要的，请参见下文：

kent$  echo "1/2 [3] (27/03/2012 19:32:54) word word word word 4/5"|sed -r 's/\([^)]*\)//g; s/[^0-9]/ /g'
1 2  3                       4 5

如果您希望它看起来更好：

kent$  echo "1/2 [3] (27/03/2012 19:32:54) word word word word 4/5"|sed -r 's/\([^)]*\)//g; s/[^0-9]/ /g;s/ */ /g'
 1 2 3 4 5

这将为您提供提取出来的数字，不包括括号中的文本：

digits=$(echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' |\
       sed 's/(.*)//' | grep -o '[0-9][0-9]*')
echo $digits

或纯sed解决方案：

echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' |\
sed -e 's/(.*)//' -e 's/[^0-9]/ /g' -e 's/[ \t][ \t]*/ /g'

输出：

1 2 3 4 5

这：

sed-r的/[（][^）]*[）]//g；s/[^0-9]+//g'

应该可以工作。它进行两次传递，首先删除括号内的表达式，然后用单空格替换所有非数字

 awk '{ first+=gensub("^([0-9]+)/.*","\\1","g",$0)
        second+=gensub("^[0-9]+/([0-9]+) .*","\\1","g",$0)
        thirdl+=gensub("^[0-9]+/[0-9]+ \[([0-9]+)\].*","\\1","g",$0)
        fourth+=gensub("^.* ([0-9]+)/[0-9]+ *$","\\1","g",$0)
        fifth+=gensub("^.* [0-9]+/([0-9]+) *$","\\1","g",$0)
      }
      END { print "first: " first " second: " second " third: " third " fourth: " fourth " fifth: " fifth
      }

可能适合您。

如果您设置了一个奇特的字段分隔符，则使用awk进行一次传递就足够了：斜杠、空格、开括号或闭括号中的任意一个分隔字段：

awk -F '[][/ ]' '
  {s1+=$1; s2+=$2; s3+=$4; s4+=$(NF-1); s5+=$NF}
  END {printf("first:%d second:%d third:%d fourth:%d fifth:%d\n", s1, s2, s3, s4, s5)}
'

TXR：

数据：

运行：

易于添加一些错误检查：

@(collect)
@  (cases)
@one/@two [@three] (@date @time) @(skip :greedy) @four/@five
@  (or)
@line
@  (throw error `badly formatted line: @line`)
@  (end)
@  (filter :tonumber one two three four five)
@(end)
@(bind (first second third fourth fifth)
       @(mapcar (op apply +) (list one two three four five)))
@(output)
first:@first second:@second third:@third fourth:@fourth fifth:@fifth
@(end)

$ txr data.txr -
foo bar junk
txr: unhandled exception of type error:
txr: ("badly formatted line: foo bar junk")
Aborted

TXR用于健壮的编程。有强类型，所以不能仅仅因为字符串包含数字就将其视为数字。变量在使用前必须绑定，因此拼写错误的变量不会默认为零或空，而是在：类型错误中生成

未绑定变量。文本提取是在大量特定的上下文中执行的，以防止将一种格式的输入误解为另一种格式。
我考虑的是grep-o[0-9]，但是如果数字有两个数字，例如1/20[35]，它将失败…我编辑了我的答案并添加了另一个基于纯sed的选项。1UP这几乎就是我想到的。太棒了！正是我需要的！下面是test.txt文件包含的整个命令的外观：10/20[30]（日期）word 40/50 10/20[30]（日期）word 40/50
So，运行时：cat test.txt | sed-r的/[（][^）]*[）//g；s/[^0-9]+//g'| awk'{sum1+=$1；sum2+=$2；sum3+=$3；sum4+=$4；sum5+=$5}结束{print“first:”sum1，“second:”sum2，“third:”sum3，“fourth:”sum4，“fifth:”sum5}
输出：第一：20秒：40秒：60秒：80秒：100
非常感谢@MichałKosmulski你们太棒了，有没有办法让评论看起来更像答案和问题？如代码块和缩进、换行等，欢迎使用。至于格式，请查看此页面：是的，我尝试过，但似乎不起作用<代码>测试代码

请参见？不要断线！无论如何，谢谢@MichałK

1/2 [3] (27/03/2012 19:32:54) word word word word 4/5
10/20 [30] (27/03/2012 19:32:54) word word 40/50

$ txr data.txr data.txt
first:11 second:22 third:33 fourth:44 fifth:55

@(collect)
@  (cases)
@one/@two [@three] (@date @time) @(skip :greedy) @four/@five
@  (or)
@line
@  (throw error `badly formatted line: @line`)
@  (end)
@  (filter :tonumber one two three four five)
@(end)
@(bind (first second third fourth fifth)
       @(mapcar (op apply +) (list one two three four five)))
@(output)
first:@first second:@second third:@third fourth:@fourth fifth:@fifth
@(end)

$ txr data.txr -
foo bar junk
txr: unhandled exception of type error:
txr: ("badly formatted line: foo bar junk")
Aborted