Sed 排除域名与全局域名匹配的电子邮件_Sed_Awk_Grep_Cut_Tr

Sed 排除域名与全局域名匹配的电子邮件

sed awk grep

Sed 排除域名与全局域名匹配的电子邮件,sed,awk,grep,cut,tr,Sed,Awk,Grep,Cut,Tr,全局域位于“*@”选项中，当电子邮件与其中一个全局域匹配时，我需要将它们从列表中排除例如： WF,*@stackoverflow.com WF,*@superuser.com WF,*@stackexchange.com WF,test@superuser.com WF,test@stackapps.com WF,test@stackexchange.com 输出： WF,*@stackoverflow.com WF,*@superuser.com WF,*@stackexchange.co

全局域位于“*@”选项中，当电子邮件与其中一个全局域匹配时，我需要将它们从列表中排除

例如：

WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@superuser.com
WF,test@stackapps.com
WF,test@stackexchange.com

输出：

WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com

你可以做：

grep -o "\*@.*" file.txt | sed -e 's/^/[^*]/' > global.txt
grep -vf global.txt file.txt

这将首先提取全局电子邮件，并用

[^*]

将结果保存到

global.txt

中。然后将该文件用作grep的输入，其中每一行都作为正则表达式处理，格式为

[^*]*@global.domain.com

。

-v

选项告诉grep只打印与该模式不匹配的行

使用sed进行就地编辑的另一个类似选项是：

grep -o "\*@.*" file.txt | sed -e 's/^.*$/\/[^*]&\/d/' > global.sed
sed -i -f global.sed file.txt

你可以做：

grep -o "\*@.*" file.txt | sed -e 's/^/[^*]/' > global.txt
grep -vf global.txt file.txt

这将首先提取全局电子邮件，并用

[^*]

将结果保存到

global.txt

中。然后将该文件用作grep的输入，其中每一行都作为正则表达式处理，格式为

[^*]*@global.domain.com

。

-v

选项告诉grep只打印与该模式不匹配的行

使用sed进行就地编辑的另一个类似选项是：

grep -o "\*@.*" file.txt | sed -e 's/^.*$/\/[^*]&\/d/' > global.sed
sed -i -f global.sed file.txt

同一个文件中有两种类型的数据，因此最简单的处理方法是首先将其分割：

<infile tee >(grep '\*@' > global) >(grep -v '\*@' > addr) > /dev/null

综合起来：

<infile tee >(grep '\*@' > global) >(grep -v '\*@' > addr) > /dev/null
cat global <(grep -vf <(cut -d@ -f2 global) addr) > outfile

使用

rm global addr

清理临时文件同一文件中有两种类型的数据，因此最简单的处理方法是首先对其进行分割：

<infile tee >(grep '\*@' > global) >(grep -v '\*@' > addr) > /dev/null

综合起来：

<infile tee >(grep '\*@' > global) >(grep -v '\*@' > addr) > /dev/null
cat global <(grep -vf <(cut -d@ -f2 global) addr) > outfile

使用

rm global addr

清理临时文件这里有一种使用

GNU awk

的方法。运行方式如下：

awk -f script.awk file.txt{,}

script.awk的内容

：

BEGIN {
    FS=","
}

FNR==NR {
    if (substr($NF,1,1) == "*") {
        array[substr($NF,2)]++
    }
    next
}

substr($NF,1,1) == "*" || !(substr($NF,index($NF,"@")) in array)

结果:

WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com

或者，这里有一个班轮：

awk -F, 'FNR==NR { if (substr($NF,1,1) == "*") array[substr($NF,2)]++; next } substr($NF,1,1) == "*" || !(substr($NF,index($NF,"@")) in array)' file.txt{,}

这里有一种使用

GNU awk

的方法。运行方式如下：

awk -f script.awk file.txt{,}

script.awk的内容

：

BEGIN {
    FS=","
}

FNR==NR {
    if (substr($NF,1,1) == "*") {
        array[substr($NF,2)]++
    }
    next
}

substr($NF,1,1) == "*" || !(substr($NF,index($NF,"@")) in array)

结果:

WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com

或者，这里有一个班轮：

awk -F, 'FNR==NR { if (substr($NF,1,1) == "*") array[substr($NF,2)]++; next } substr($NF,1,1) == "*" || !(substr($NF,index($NF,"@")) in array)' file.txt{,}

通过一次文件传递并允许全局域与地址混合：

$ cat file
WF,*@stackoverflow.com
WF,test@superuser.com
WF,*@superuser.com
WF,test@stackapps.com
WF,test@stackexchange.com
WF,*@stackexchange.com
WF,foo@stackapps.com
$
$ awk -F'[,@]' '
   $2=="*" { glbl[$3]; print; next }
   { addrs[$3] = addrs[$3] $0 ORS }
   END {
      for (dom in addrs)
         if (!(dom in glbl))
            printf "%s",addrs[dom]
   }
' file
WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com
WF,foo@stackapps.com

或者，如果您不介意两次通过的方法：

$ awk -F'[,@]' '(NR==FNR && $2=="*" && !glbl[$3]++) || (NR!=FNR && !($3 in glbl))' file file
WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com
WF,foo@stackapps.com

我知道第二个有点神秘，但它很容易翻译为不使用默认操作和awk习惯用法中的一个很好的练习：-）。

只需传递一次文件，并允许全局域与地址混合：

$ cat file
WF,*@stackoverflow.com
WF,test@superuser.com
WF,*@superuser.com
WF,test@stackapps.com
WF,test@stackexchange.com
WF,*@stackexchange.com
WF,foo@stackapps.com
$
$ awk -F'[,@]' '
   $2=="*" { glbl[$3]; print; next }
   { addrs[$3] = addrs[$3] $0 ORS }
   END {
      for (dom in addrs)
         if (!(dom in glbl))
            printf "%s",addrs[dom]
   }
' file
WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com
WF,foo@stackapps.com

或者，如果您不介意两次通过的方法：

$ awk -F'[,@]' '(NR==FNR && $2=="*" && !glbl[$3]++) || (NR!=FNR && !($3 in glbl))' file file
WF,*@stackoverflow.com
WF,*@superuser.com
WF,*@stackexchange.com
WF,test@stackapps.com
WF,foo@stackapps.com

我知道第二个有点神秘，但它很容易翻译为不使用默认操作，这是awk习惯用法中的一个很好的练习：-）。

这可能适合您（GNU-sed）：

这可能适用于您（GNU-sed）：

全局域是否总是在电子邮件地址之前？在这种情况下是的，但在将来不是。全局域是否总是在电子邮件地址之前？在这种情况下是的，但是在将来不。谢谢你的精彩解释。谢谢你的精彩解释。对于我的解释非常有用和简单。对于我的解释非常有用和简单。