Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/linq/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Linux 在引号之间找到单词_Linux_Unix_Sed_Awk_Grep - Fatal编程技术网

Linux 在引号之间找到单词

Linux 在引号之间找到单词,linux,unix,sed,awk,grep,Linux,Unix,Sed,Awk,Grep,我有这样的x行: Unable to find latest released revision of 'CONTRIB_046578'. 我需要提取“的版本和”之间的单词,在本例中,单词CONTRIB_046578,如果可能,使用grep、sed或任何其他命令计算该单词的出现次数?假设: 每个单词可以出现多次,OP希望统计每个单词出现的次数 文件中没有其他行 输入文件: $ cat test.txt Unable to find latest released revision

我有这样的x行:

Unable to find latest released revision of 'CONTRIB_046578'.   
我需要提取“的
版本和
”之间的单词,在本例中,单词
CONTRIB_046578
,如果可能,使用
grep
sed
或任何其他命令计算该单词的出现次数?

假设:

  • 每个单词可以出现多次,OP希望统计每个单词出现的次数
  • 文件中没有其他行
输入文件:

$ cat test.txt 
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
  1 CONTRIB_046570
  2 CONTRIB_046572
  2 CONTRIB_046578
  1 CONTRIB_046579
 awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile

CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
cut -d ' ' -f 8 file | tr -d "'" | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
1 CONTRIB_046578
3 CONTRIB_046579
筛选和计数单词的Shell脚本:

$ cat test.txt 
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
  1 CONTRIB_046570
  2 CONTRIB_046572
  2 CONTRIB_046578
  1 CONTRIB_046579
 awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile

CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
cut -d ' ' -f 8 file | tr -d "'" | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
1 CONTRIB_046578
3 CONTRIB_046579

以下是一个awk脚本,您可以使用它提取和计算单引号中每个单词的频率:

awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile
输出:

$ cat test.txt 
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
  1 CONTRIB_046570
  2 CONTRIB_046572
  2 CONTRIB_046578
  1 CONTRIB_046579
 awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile

CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
cut -d ' ' -f 8 file | tr -d "'" | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
1 CONTRIB_046578
3 CONTRIB_046579

awk'{for(i=1;i最干净的解决方案是使用
grep-Po”(?您只需要一个非常简单的awk脚本来计算引号之间出现的内容:

awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
使用@anubhava的测试输入文件:

$ cat file
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
$
$ awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
CONTRIB_046578 1
CONTRIB_046579 3
CONTRIB_046570 1
CONTRIB_046572 2

如果下面的测试文件代表了实际问题中的文件,那么以下内容可能有用

基于测试文件中的每一行都是同质的(即格式良好且包含8列(或字段)),使用
cut
命令的简便解决方案如下:

文件:

$ cat test.txt 
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
  1 CONTRIB_046570
  2 CONTRIB_046572
  2 CONTRIB_046578
  1 CONTRIB_046579
 awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile

CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
cut -d ' ' -f 8 file | tr -d "'" | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
1 CONTRIB_046578
3 CONTRIB_046579
代码:

$ cat test.txt 
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
  1 CONTRIB_046570
  2 CONTRIB_046572
  2 CONTRIB_046578
  1 CONTRIB_046579
 awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile

CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
cut -d ' ' -f 8 file | tr -d "'" | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
1 CONTRIB_046578
3 CONTRIB_046579
输出:

$ cat test.txt 
Unable to find latest released revision of 'CONTRIB_046578'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046579'.
Unable to find latest released revision of 'CONTRIB_046570'.
Unable to find latest released revision of 'CONTRIB_046572'.
Unable to find latest released revision of 'CONTRIB_046578'.
$ sed "s/.*'\(.*\)'.*/\1/" test.txt | sort | uniq -c
  1 CONTRIB_046570
  2 CONTRIB_046572
  2 CONTRIB_046578
  1 CONTRIB_046579
 awk '{for (i=1; i<=NF; i++) {if ($i ~ /^'"'.*?'"'/ ) cnt[$i]++;}} 
      END {for (a in cnt) {b=a; gsub(/'"'"'/, "", b); print b, cnt[a]}}' infile

CONTRIB_046579 3
CONTRIB_046578 1
CONTRIB_046570 1
CONTRIB_046572 2
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
cut -d ' ' -f 8 file | tr -d "'" | sort | uniq -c
1 CONTRIB_046570
2 CONTRIB_046572
1 CONTRIB_046578
3 CONTRIB_046579
代码注释:
cut
用于分隔每个字段的默认分隔符是
tab
,但由于我们要求分隔符是分隔每个字段的单个空格,因此我们指定选项
-d'
。代码的其余部分与其他答案类似,因此我不会重复前面所说的内容


一般说明:如果文件的格式不符合我前面提到的格式,则此代码可能无法达到所需的输出。

您是否已对此进行了任何努力?是否存在与该单词相关的重复项?中间是否有其他需要丢弃的行?与其查找“”之间的单词,如何查找“”之间的单词:“和”“的修订版”?与其在”“之间查找单词,我如何才能在“和”“的修订版”之间查找单词?有很多选项,这取决于您试图避免错误匹配的输入中的内容。一种方法是
awk-F”(^.*修订“|”[^']*$)“{c[$2]+}END{for(w in c)print w,c[w]}”'文件
。如果这对您不起作用,请告诉我们原因,并提供一个更具代表性的输入文件。分段错误。te文件中还有其他行,顺便说一句,搜索应该在“修订版”和“'”之间。