Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/shell/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 提取两个正则表达式之间的字符串|&引用;模式_String_Shell_Extract_Cat - Fatal编程技术网

String 提取两个正则表达式之间的字符串|&引用;模式

String 提取两个正则表达式之间的字符串|&引用;模式,string,shell,extract,cat,String,Shell,Extract,Cat,我想提取gi和之间的所有字符串。字符串在所有行中的位置都是一致的 我正在尝试这个: cat ERR594382_second_cat.test | sed -n '/gi\|/,/\|/p' 但是,它不起作用 这是我的文件头: head ERR594382_second_cat.test ERR594382.28316455_3_6_1 gi|914605561|ref|WP_050599988.1| 22 54 67 99 4.03e-15 77.0 100

我想提取
gi
之间的所有字符串。字符串在所有行中的位置都是一致的

我正在尝试这个:

cat ERR594382_second_cat.test | sed -n '/gi\|/,/\|/p'
但是,它不起作用

这是我的文件头:

head ERR594382_second_cat.test 
ERR594382.28316455_3_6_1    gi|914605561|ref|WP_050599988.1|    22  54  67  99  4.03e-15    77.0    100.000 33  0   0   225971;1306953  Bacteria    Erythrobacter citreus;Erythrobacter citreus LAMA 915    ribonuclease D [Erythrobacter citreus]
ERR594382.28316455_65_2_3   gi|914605561|ref|WP_050599988.1|    13  46  11  44  2.15e-17    82.8    100.000 34  0   0   225971;1306953  Bacteria    Erythrobacter citreus;Erythrobacter citreus LAMA 915    ribonuclease D [Erythrobacter citreus]
ERR594382.28316459_1_1_2    gi|1270336953|gb|PHR32068.1|    8   53  863 903 6.98e-08    56.6    63.043  46  12  1   2024840 Bacteria    Methylophaga sp.    phosphohydrolase [Methylophaga sp.]
ERR594382.28316464_2_2_3    gi|705244733|gb|AIW56710.1| 2   33  145 176 5.76e-12    67.8    93.750  32  2   0   340016  Viruses uncultured virus    ribonucleotide reductase, partial [uncultured virus]
ERR594382.28316464_53_5_5   gi|1200458341|gb|OUV73944.1|    1   31  557 587 9.54e-11    64.3    80.645  31  6   0   1986721 Bacteria    Flavobacteriales bacterium TMED123  hypothetical protein CBC83_04720 [Flavobacteriales bacterium TMED123]
ERR594382.28316465_3_3_2    gi|787065740|dbj|BAR36435.1|    1   46  204 249 5.55e-10    63.2    58.696  46  19  0   1407671 Viruses uncultured Mediterranean phage uvMED    hypothetical protein [uncultured Mediterranean phage uvMED]
ERR594382.28316465_67_4_3   gi|787065740|dbj|BAR36435.1|    2   34  224 256 1.31e-07    55.1    66.667  33  11  0   1407671 Viruses uncultured Mediterranean phage uvMED    hypothetical protein [uncultured Mediterranean phage uvMED]
ERR594382.28316466_18_6_3   gi|1200295886|gb|OUU17830.1|    1   33  92  124 1.73e-12    70.1    100.000 33  0   0   1986638 Bacteria    Alphaproteobacteria bacterium TMED37    hypothetical protein CBB97_21775 [Candidatus Endolissoclinum sp. TMED37]
ERR594382.28316470_37_1_1   gi|787067413|dbj|BAR37857.1|    16  43  60  87  1.94e-09    58.9    96.429  28  1   0   1407671 Viruses uncultured Mediterranean phage uvMED    terminase large subunit [uncultured Mediterranean phage uvMED]
ERR594382.28316474_2_5_1    gi|1219813777|gb|ASN63501.1|    1   33  62  94  3.55e-12    64.3    81.818  33  6   0   340016  Viruses uncultured

您可以将
grep
或/
pcregremp
(在使用macOS的情况下)用于:

pcregrep -o "gi\|\K.+?(?=\|)" file
或与:

grep -oP "gi\|\K.+?(?=\|)" file
\K
可以理解为排除了它前面左边的所有内容,只返回右边的部分
+
,然后
+?(?=\ \ \)
匹配任何字符,直到找到

如果您的分隔符是固定的,最简单的方法是使用
cut

cut -f2 -d"|" file

()
sed's/*gi\\([^ 124;]\+\)\./\ 1/g'ERR594382\u second\u cat.test
cut-f2-d''ERR594382\u second\u cat.test
?亲爱的Biffen,感谢您的快速回复。第一个不工作,因为它打印了所有的行,但是第二个看起来运行得很好。