Bash 计算子字符串在字符串中出现的次数_Bash_Shell_Awk_Sed_Grep

Bash 计算子字符串在字符串中出现的次数

bash shell awk sed grep

Bash 计算子字符串在字符串中出现的次数,bash,shell,awk,sed,grep,Bash,Shell,Awk,Sed,Grep,如何使用Bash计算字符串中子字符串的出现次数例如：我想知道这个子串有多少次 Bluetooth Soft blocked: no Hard blocked: no 0: asus-wlan: Wireless LAN Soft blocked: no Hard blocked: no 1: asus-bluetooth: Bluetooth Soft blocked: no Har

如何使用Bash计算字符串中子字符串的出现次数

例如：

我想知道这个子串有多少次

Bluetooth
         Soft blocked: no
         Hard blocked: no

0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no

…发生在此字符串中

0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no

注一：我已经尝试了几种方法与sed，grep，awk。。。当我们有带空格和多行的字符串时，似乎什么都不起作用

注二：我是Linux用户，我正在尝试一种解决方案，该解决方案不涉及在Linux发行版中常见的应用程序/工具之外安装应用程序/工具

重要：

除了我的问题之外，根据下面的假设例子，可能还有一些东西。在本例中，我们不使用文件，而是使用两个Shell变量Bash

示例：基于@Ed Morton的贡献

STRING="0: asus-wlan: Wireless LAN
         Soft blocked: no
         Hard blocked: no
1: asus-bluetooth: Bluetooth
         Soft blocked: no
         Hard blocked: no
2: phy0: Wireless LAN
         Soft blocked: no
         Hard blocked: no
113: hci0: Bluetooth
         Soft blocked: no
         Hard blocked: no"

SUB_STRING="Bluetooth
         Soft blocked: no
         Hard blocked: no"

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' "$STRING" "$SUB_STRING"

使用GNU awk：

$ awk '
BEGIN { RS="[0-9]+:" }      # number followed by colon is the record separator
NR==1 {                     # read the substring to b
    b=$0
    next
}
$0~b { c++ }                # if b matches current record, increment counter
END { print c }             # print counter value
' substringfile stringfile
2

此解决方案要求匹配与空间量相同，并且您的示例无法正常工作，因为子字符串在缩进中的空间小于字符串。注意，由于选择了RS匹配，例如phy0:是不可能的；在这种情况下，像RS=^ |\n[0-9]+：这样的东西可能会起作用

另一个：

$ awk '
BEGIN{ RS="^$" }                           # treat whole files as one record
NR==1 { b=$0; next }                       # buffer substringfile
{
    while(match($0,b)) {                   # count matches of b in stringfile
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }                            # output
' substringfile stringfile

编辑：当然，删除BEGIN部分并使用Bash的流程替换，如下所示：

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{
    while(match($0,b)) {
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

因此，空间问题应该得到缓解

编辑2：根据@EdMorton在评论中鹰眼的观察：

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{ print gsub(b,"") }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

使用GNU awk：

$ awk '
BEGIN { RS="[0-9]+:" }      # number followed by colon is the record separator
NR==1 {                     # read the substring to b
    b=$0
    next
}
$0~b { c++ }                # if b matches current record, increment counter
END { print c }             # print counter value
' substringfile stringfile
2

另一个：

$ awk '
BEGIN{ RS="^$" }                           # treat whole files as one record
NR==1 { b=$0; next }                       # buffer substringfile
{
    while(match($0,b)) {                   # count matches of b in stringfile
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }                            # output
' substringfile stringfile

编辑：当然，删除BEGIN部分并使用Bash的流程替换，如下所示：

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{
    while(match($0,b)) {
        $0=substr($0,RSTART+RLENGTH-1)
        c++
    }
}
END { print c }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

因此，空间问题应该得到缓解

编辑2：根据@EdMorton在评论中鹰眼的观察：

$ awk '
NR==1 { 
    b=$0
    gsub(/^ +| +$/,"",b)                 # clean surrounding space from substring
    next 
}
{ print gsub(b,"") }
' <(echo $SUB_STRING) <(echo $STRING)    # feed it with process substitution
2

您可以尝试使用GNU grep：

grep -zo -P ".*Bluetooth\n\s*Soft blocked: no\n\s*Hard blocked: no" <your_file> | grep -c "Bluetooth"

整个命令的输出：

您可以尝试使用GNU grep：

grep -zo -P ".*Bluetooth\n\s*Soft blocked: no\n\s*Hard blocked: no" <your_file> | grep -c "Bluetooth"

整个命令的输出：

使用python：

#! /usr/bin/env python

import sys
import re

with open(sys.argv[1], 'r') as i:
    print(len(re.findall(sys.argv[2], i.read(), re.MULTILINE)))

援引为

$ ./search.py file.txt 'Bluetooth
 +Soft blocked: no
 +Hard blocked: no'

$ ./search.py "$STRING" "$SUB_STRING"

+允许一个或多个空格

编辑如果内容已经包含在bash变量中，则更简单

#! /usr/bin/env python

import sys
import re

print(len(re.findall(sys.argv[2], sys.argv[1], re.MULTILINE)))

援引为

$ ./search.py file.txt 'Bluetooth
 +Soft blocked: no
 +Hard blocked: no'

$ ./search.py "$STRING" "$SUB_STRING"

使用python：

#! /usr/bin/env python

import sys
import re

with open(sys.argv[1], 'r') as i:
    print(len(re.findall(sys.argv[2], i.read(), re.MULTILINE)))

援引为

$ ./search.py file.txt 'Bluetooth
 +Soft blocked: no
 +Hard blocked: no'

$ ./search.py "$STRING" "$SUB_STRING"

+允许一个或多个空格

编辑如果内容已经包含在bash变量中，则更简单

#! /usr/bin/env python

import sys
import re

print(len(re.findall(sys.argv[2], sys.argv[1], re.MULTILINE)))

援引为

$ ./search.py file.txt 'Bluetooth
 +Soft blocked: no
 +Hard blocked: no'

$ ./search.py "$STRING" "$SUB_STRING"

这可能适用于GNU sed&wc：

sed -nr 'N;/^(\s*)Soft( blocked: no\s*)\n\1Hard\2$/P;D' file | wc -l

为每次出现的多行匹配输出一行，并计算行数。

这可能适用于GNU sed&wc：

sed -nr 'N;/^(\s*)Soft( blocked: no\s*)\n\1Hard\2$/P;D' file | wc -l

为每次出现的多行匹配输出一行，并计算行数。

另一个awk

awk '
  NR==FNR{
    b[i++]=$0          # get each line of string in array b
    next}
  $0 ~ b[0]{            # if current record match first line of string
    for(j=1;j<i;j++){
      getline
      if($0!~b[j])  # next record do not match break
        j+=i}
     if(j==i)         # all record match string
       k++}
  END{
    print k}
' stringfile infile

你这样称呼它：

./scriptbash.sh$String$Sub_String

另一个awk

awk '
  NR==FNR{
    b[i++]=$0          # get each line of string in array b
    next}
  $0 ~ b[0]{            # if current record match first line of string
    for(j=1;j<i;j++){
      getline
      if($0!~b[j])  # next record do not match break
        j+=i}
     if(j==i)         # all record match string
       k++}
  END{
    print k}
' stringfile infile

你这样称呼它：

./scriptbash.sh$String$Sub_String

如果两个字符串中的空格相同，则根据下面的注释进行更新：

awk 'BEGIN{print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

或者，如果空格与示例中的不同，即字符串行以9个空格开头，但子字符串以8个空格开头：

$ awk 'BEGIN{gsub(/[[:space:]]+/,"[[:space:]]+",ARGV[2]); print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

原始答复：

使用GNU awk，如果文件和搜索字符串之间匹配的空白不包含RE元字符，则需要：

awk -v RS='^$' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

如果您的输入也不包含NUL字符，则使用任何awk：

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

$ awk -v RS='\0' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

但要获得完整的解决方案和解释，请继续阅读：

在任何UNIX设备上的任何shell中使用任何POSIX awk：

$ cat str
Bluetooth
        Soft blocked: no
        Hard blocked: no

$ awk '
NR==FNR { str=(str=="" ? "" : str ORS) $0; next }
{ rec=(rec=="" ? "" : rec ORS) $0 }
END {
    gsub(/[^[:space:]]/,"[&]",str) # make sure each non-space char is treated as literal
    gsub(/[[:space:]]+/,"[[:space:]]+",str) # make sure space differences do not matter
    print gsub(str,"",rec)
}
' str file
2

对于像nawk这样的非POSIX awk，只需使用0-9而不是[：空格：]。如果您的搜索字符串可以包含反斜杠，那么我们需要再添加1个gsub来处理它们

或者，使用GNU awk进行多字符RS：

$ awk -v RS='^$' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

如果您的输入不能包含NUL字符，则使用任何awk：

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

$ awk -v RS='\0' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

如果两个字符串中的空格相同，则根据您的评论更新以下内容：

awk 'BEGIN{print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

或者，如果空格与示例中的不同，即字符串行以9个空格开头，但子字符串以8个空格开头：

$ awk 'BEGIN{gsub(/[[:space:]]+/,"[[:space:]]+",ARGV[2]); print gsub(ARGV[2],"",ARGV[1])}' "$STRING" "$SUB_STRING"

原始答复：

使用GNU awk，如果文件和搜索字符串之间匹配的空白不包含RE元字符，则需要：

awk -v RS='^$' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

如果您的输入也不包含NUL字符，则使用任何awk：

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

$ awk -v RS='\0' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

但要获得完整的解决方案和解释，请继续阅读：

在任何UNIX设备上的任何shell中使用任何POSIX awk：

$ cat str
Bluetooth
        Soft blocked: no
        Hard blocked: no

$ awk '
NR==FNR { str=(str=="" ? "" : str ORS) $0; next }
{ rec=(rec=="" ? "" : rec ORS) $0 }
END {
    gsub(/[^[:space:]]/,"[&]",str) # make sure each non-space char is treated as literal
    gsub(/[[:space:]]+/,"[[:space:]]+",str) # make sure space differences do not matter
    print gsub(str,"",rec)
}
' str file
2

对于像nawk这样的非POSIX awk，只需使用0-9而不是[：空格：]。如果您的搜索字符串可以包含反斜杠，那么我们需要再添加1个gsub来处理它们

或者，使用GNU awk进行多字符RS：

$ awk -v RS='^$' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

如果您的输入不能包含NUL字符，则使用任何awk：

awk -v RS='\0' 'NR==FNR{str=$0; next} {print gsub(str,"")}' str file

$ awk -v RS='\0' 'NR==FNR{gsub(/[^[:space:]]/,"[&]"); gsub(/[[:space:]]+/,"[[:space:]]+"); str=$0; next} {print gsub(str,"")}' str file
2

非常感谢您的贡献。但是，我想要这样的东西：awk-vrs='\0''NR==FNR{str=$0；next}{print gsubstr，}'$STRING$SUB_STRING。有关更多详细信息，请参阅我在问题中所做的修改。非常感谢，但是awk'BEGIN{print gsubARGV[2]，'ARGV[1]}'$STRING$SUB_STRING代码返回0。对，这是因为在

例如，您发布的SUB_字符串8的每一行开头的空格都少于字符串9的每一行开头的空格，这就是为什么我发布了第二个脚本，其中包含空格的差异。请稍候。你到底在说什么？您提供的示例在子字符串中的空格数与字符串中的空格数不同。你的问题暗示应该匹配，而现在你的评论暗示不应该匹配。我给你的脚本可以在任何一种情况下工作，而现在你说两者都不做你想做的事情。我对你想做什么感到非常困惑。请更新您的问题，使其更加清晰，并具体说明为什么我的答案顶部的两个脚本都不适用于您尝试执行的任何操作。您提供的示例的子字符串中的空格数与字符串中的空格数不同。->对你是对的！我已经更正了SUB_字符串中的空格数。您的方法awk'BEGIN{print gsubARGV[2]，'ARGV[1]}'$STRING$SUB_STRING工作得非常好！你的答案是正确的！抱歉搞混了！非常感谢=D非常感谢你的贡献。但是，我想要这样的东西：awk-vrs='\0''NR==FNR{str=$0；next}{print gsubstr，}'$STRING$SUB_STRING。有关更多详细信息，请参阅我在问题中所做的修改。非常感谢，但是awk'BEGIN{print gsubARGV[2]，'ARGV[1]}'$STRING$SUB_STRING代码返回0。对，这是因为在您发布的示例中，SUB_STRING 8的每行开头的空格少于STRING 9的每行开头，这就是为什么我发布了第二个脚本，它适应了空白的差异。等等。你到底在说什么？您提供的示例在子字符串中的空格数与字符串中的空格数不同。你的问题暗示应该匹配，而现在你的评论暗示不应该匹配。我给你的脚本可以在任何一种情况下工作，而现在你说两者都不做你想做的事情。我对你想做什么感到非常困惑。请更新您的问题，使其更加清晰，并具体说明为什么我的答案顶部的两个脚本都不适用于您尝试执行的任何操作。您提供的示例的子字符串中的空格数与字符串中的空格数不同。->对你是对的！我已经更正了SUB_字符串中的空格数。您的方法awk'BEGIN{print gsubARGV[2]，'ARGV[1]}'$STRING$SUB_STRING工作得非常好！你的答案是正确的！抱歉搞混了！非常感谢=谢谢@Vivek Pabani，但我认为你的回答有问题。我们必须对修改空格数量的字符串进行转义，此外，我们还必须具有一些功能来自动化这个过程。除此之外，我希望能够使用Shell变量Bash查看修改和我的问题！你所说的逃逸空间是什么意思？你能从你的例子中说明你期望的结果吗？我会尽量解释得更好。例如，如果我的输入是软阻止：否或带有值\s*软阻止：无匹配结果的软阻止：否，则对输入的原始字符串的空格不起作用。谢谢=DSo，是否要将子字符串中的确切空格与字符串匹配？如果可能，请选择=谢谢@Vivek Pabani，但我认为你的回答有问题。我们必须对修改空格数量的字符串进行转义，此外，我们还必须具有一些功能来自动化这个过程。除此之外，我希望能够使用Shell变量Bash查看修改和我的问题！你所说的逃逸空间是什么意思？你能从你的例子中说明你期望的结果吗？我会尽量解释得更好。例如，如果我的输入是软阻止：否或带有值\s*软阻止：无匹配结果的软阻止：否，则对输入的原始字符串的空格不起作用。谢谢=DSo，是否要将子字符串中的确切空格与字符串匹配？如果可能，请选择=我希望能够使用Shell变量Bash查看我问题中的修改！有没有办法做到这一点？谢谢=我希望能够使用Shell变量Bash查看我问题中的修改！有没有办法做到这一点？谢谢=回答得好！然而，在我的例子中，我们需要通过输入的确切子字符串进行计数。除此之外，我希望能够使用Shell变量Bash来查看问题中的修改。如果我们可以在bash脚本本身中使用这个Python代码，那么它将是完美的。谢谢=回答得好！然而，在我的例子中，我们需要通过输入的确切子字符串进行计数。除此之外，我希望能够使用Shell变量Bash来查看问题中的修改。如果我们可以在bash脚本本身中使用这个Python代码，那么它将是完美的。谢谢=好的！不过我还是会的

为了能够使用Shell变量Bash，请参见我问题中的修改。有没有办法做到这一点？谢谢=你的答案几乎是完美的，尽管我现在不能接受，因为如果字符串和SUB_字符串之间的空格有差异，那么实际上字符串是不同的，我们就会得到错误的答案。例如，如果我们有STRING=Soft blocked:no和SUB_STRING=Soft blocked:no，脚本将返回2。我们需要精确的字符串匹配。非常感谢=DJust在b=$0'之后添加gsub/^+|+$/，、b以清除子字符串的前导和尾随空间。echo`删除重复项，子字符串将使字符串与周围的空格相匹配。下一步将是近似模式匹配。FY{{WrimeMeCH $ 0，B {StrimFig $ 0＝SUBL $ 0，RSTAR+RGRESTH-1 C++ }结束{打印C}相当于使用{{打印GSBU}}}。另外，我认为OP的说法与你的想法相反，他不希望删除前导/尾随空格，这是我们首先假设的，所以我想知道为什么现在会出现它！。这一切都很混乱…@EdMorton是的，真的是，哈哈！谢谢我将为解决方案链添加另一部分。很好！不过，我希望能够使用Shell变量Bash来查看问题中的修改。有没有办法做到这一点？谢谢=你的答案几乎是完美的，尽管我现在不能接受，因为如果字符串和SUB_字符串之间的空格有差异，那么实际上字符串是不同的，我们就会得到错误的答案。例如，如果我们有STRING=Soft blocked:no和SUB_STRING=Soft blocked:no，脚本将返回2。我们需要精确的字符串匹配。非常感谢=DJust在b=$0'之后添加gsub/^+|+$/，、b以清除子字符串的前导和尾随空间。echo`删除重复项，子字符串将使字符串与周围的空格相匹配。下一步将是近似模式匹配。FY{{WrimeMeCH $ 0，B {StrimFig $ 0＝SUBL $ 0，RSTAR+RGRESTH-1 C++ }结束{打印C}相当于使用{{打印GSBU}}}。另外，我认为OP的说法与你的想法相反，他不希望删除前导/尾随空格，这是我们首先假设的，所以我想知道为什么现在会出现它！。这一切都很混乱…@EdMorton是的，真的是，哈哈！谢谢我将为解决方案链添加另一部分。