Awk 删除带有sed和regexp的行
我有一个文件如下所示:Awk 删除带有sed和regexp的行,awk,sed,Awk,Sed,我有一个文件如下所示: rs994321 - chr6_ssto_hap7 712891 G A 0.011180599999999999 0.0058201 62357 rs994321 - chr6_mcf_hap5 675532 G A 0.011180599999999999 0.0058201 62357 rs994321 - chr6_mann_hap4 675338 G A 0.0111
rs994321 - chr6_ssto_hap7 712891 G A 0.011180599999999999 0.0058201 62357
rs994321 - chr6_mcf_hap5 675532 G A 0.011180599999999999 0.0058201 62357
rs994321 - chr6_mann_hap4 675338 G A 0.011180599999999999 0.0058201 62357
rs994321 - chr4_dbb_hap3 675681 G A 0.011180599999999999 0.0058201 62357
rs994321 - chr4_cox_hap2 891136 G A 0.011180599999999999 0.0058201 62357
rs994321 - chr6 29372356 G A 0.011180599999999999 0.0058201 62357
rs9943219 + chr1 238691947 A G 0.00700761 0.00727069 62357
rs9943217 + chr1 238691673 A G 0.00663929 0.00715566 62357
我想删除带有模式chr*\u*\ uhap*
的行。在我的示例中,应该只保留最后3行。我已尝试使用以下命令,但它们不起作用:
sed '/chr[0-9]_*_hap[0-9]/d' test.txt
sed '/*_hap[0-9]/d' test.txt
sed '/\*_hap[0-9]/d' test.txt
我不太擅长使用regexp
$ egrep -v '\bchr([^_]*_){2}hap[0-9]\b' data
rs994321 - chr6 29372356 G A 0.011180599999999999 0.0058201 62357
rs9943219 + chr1 238691947 A G 0.00700761 0.00727069 62357
rs9943217 + chr1 238691673 A G 0.00663929 0.00715566 62357
或使用sed
:
$ sed -r '/\bchr([^_]*_){2}hap[0-9]\b/d' data
rs994321 - chr6 29372356 G A 0.011180599999999999 0.0058201 62357
rs9943219 + chr1 238691947 A G 0.00700761 0.00727069 62357
rs9943217 + chr1 238691673 A G 0.00663929 0.00715566 62357
$ awk '! /chr[^_]*_[^_]*_hap[0-9]/' data
rs994321 - chr6 29372356 G A 0.011180599999999999 0.0058201 62357
rs9943219 + chr1 238691947 A G 0.00700761 0.00727069 62357
rs9943217 + chr1 238691673 A G 0.00663929 0.00715566 62357
使用awk
:
$ sed -r '/\bchr([^_]*_){2}hap[0-9]\b/d' data
rs994321 - chr6 29372356 G A 0.011180599999999999 0.0058201 62357
rs9943219 + chr1 238691947 A G 0.00700761 0.00727069 62357
rs9943217 + chr1 238691673 A G 0.00663929 0.00715566 62357
$ awk '! /chr[^_]*_[^_]*_hap[0-9]/' data
rs994321 - chr6 29372356 G A 0.011180599999999999 0.0058201 62357
rs9943219 + chr1 238691947 A G 0.00700761 0.00727069 62357
rs9943217 + chr1 238691673 A G 0.00663929 0.00715566 62357
完整且漂亮:-)REs末尾的
*
s不起作用,请将其删除。同样,从awk语句中删除{print}
。regexp实际上应该是chr[^]*.[^]*.\uhap
,以便更健壮,并且可以简化为chr([^]*.{2}hap
,以简化大多数命令。通常,您还应该添加锚和/或标识特定字段,但对于这种输入格式可能没有问题。