Awk 删除带有sed和regexp的行

Awk 删除带有sed和regexp的行,awk,sed,Awk,Sed,我有一个文件如下所示: rs994321 - chr6_ssto_hap7 712891 G A 0.011180599999999999 0.0058201 62357 rs994321 - chr6_mcf_hap5 675532 G A 0.011180599999999999 0.0058201 62357 rs994321 - chr6_mann_hap4 675338 G A 0.0111

我有一个文件如下所示:

rs994321    -   chr6_ssto_hap7  712891  G   A   0.011180599999999999        0.0058201   62357
rs994321    -   chr6_mcf_hap5   675532  G   A   0.011180599999999999    0.0058201   62357
rs994321    -   chr6_mann_hap4  675338  G   A   0.011180599999999999    0.0058201   62357
rs994321    -   chr4_dbb_hap3   675681  G   A   0.011180599999999999    0.0058201   62357
rs994321    -   chr4_cox_hap2   891136  G   A   0.011180599999999999    0.0058201   62357
rs994321    -   chr6    29372356    G   A   0.011180599999999999    0.0058201   62357
rs9943219   +   chr1    238691947   A   G   0.00700761  0.00727069  62357
rs9943217   +   chr1    238691673   A   G   0.00663929  0.00715566  62357
我想删除带有模式
chr*\u*\ uhap*
的行。在我的示例中,应该只保留最后3行。我已尝试使用以下命令,但它们不起作用:

sed '/chr[0-9]_*_hap[0-9]/d' test.txt
sed '/*_hap[0-9]/d' test.txt
sed '/\*_hap[0-9]/d' test.txt
我不太擅长使用regexp

$ egrep -v '\bchr([^_]*_){2}hap[0-9]\b' data
rs994321    -   chr6    29372356    G   A   0.011180599999999999    0.0058201   62357
rs9943219   +   chr1    238691947   A   G   0.00700761  0.00727069  62357
rs9943217   +   chr1    238691673   A   G   0.00663929  0.00715566  62357
或使用
sed

$ sed -r '/\bchr([^_]*_){2}hap[0-9]\b/d' data
rs994321    -   chr6    29372356    G   A   0.011180599999999999    0.0058201   62357
rs9943219   +   chr1    238691947   A   G   0.00700761  0.00727069  62357
rs9943217   +   chr1    238691673   A   G   0.00663929  0.00715566  62357
$ awk '! /chr[^_]*_[^_]*_hap[0-9]/' data
rs994321    -   chr6    29372356    G   A   0.011180599999999999    0.0058201   62357
rs9943219   +   chr1    238691947   A   G   0.00700761  0.00727069  62357
rs9943217   +   chr1    238691673   A   G   0.00663929  0.00715566  62357
使用
awk

$ sed -r '/\bchr([^_]*_){2}hap[0-9]\b/d' data
rs994321    -   chr6    29372356    G   A   0.011180599999999999    0.0058201   62357
rs9943219   +   chr1    238691947   A   G   0.00700761  0.00727069  62357
rs9943217   +   chr1    238691673   A   G   0.00663929  0.00715566  62357
$ awk '! /chr[^_]*_[^_]*_hap[0-9]/' data
rs994321    -   chr6    29372356    G   A   0.011180599999999999    0.0058201   62357
rs9943219   +   chr1    238691947   A   G   0.00700761  0.00727069  62357
rs9943217   +   chr1    238691673   A   G   0.00663929  0.00715566  62357

完整且漂亮:-)REs末尾的
*
s不起作用,请将其删除。同样,从awk语句中删除
{print}
。regexp实际上应该是
chr[^]*.[^]*.\uhap
,以便更健壮,并且可以简化为
chr([^]*.{2}hap
,以简化大多数命令。通常,您还应该添加锚和/或标识特定字段,但对于这种输入格式可能没有问题。