如何在PHP中捕获以下模糊的电子邮件地址？_Php_Regex_Obfuscation_Email Validation

如何在PHP中捕获以下模糊的电子邮件地址？

php regex

如何在PHP中捕获以下模糊的电子邮件地址？,php,regex,obfuscation,email-validation,Php,Regex,Obfuscation,Email Validation,考虑以下脚本，该脚本包含模糊的电子邮件地址，以及一个函数，该函数尝试使用正则表达式模式匹配将这些地址替换为***。我的脚本试图捕捉以下单词：“at”、“a t”、“a.t”、“@”，后跟一些文本（任何域名），后跟“dot”。“d.o.t”，后跟TLD 输入： $str[] = 'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com'; $str[] = 'I live at school where My address is dsfdsf@hotmail.

考虑以下脚本，该脚本包含模糊的电子邮件地址，以及一个函数，该函数尝试使用正则表达式模式匹配将这些地址替换为

***

。我的脚本试图捕捉以下单词：

“at”、“a t”、“a.t”、“@”

，后跟一些文本（任何域名），后跟

“dot”。“d.o.t”

，后跟TLD

输入：

$str[] = 'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'I live at school where My address is dsfdsf@hotmail.com'; 
$str[] = 'I live at school. My address is dsfdsf@hotmail.com'; 
$str[] = 'at school my address is dsfdsf@hotmail.com'; 
$str[] = 'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'd s f d s f a t h o t m a i l . c o m';

function clean_text($text){
    $pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU'; 
    return preg_replace($pattern, '***', $text); 
}

foreach($str as $email){ 
     echo clean_text($email); 
}

dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live *** 
I live *** 
at school my address is dsfdsf****
dsf *** 
d s f d s f ***

预期输出：

dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live at school where My address is dsfdsf@***
I live at school. My address is dsfdsf@***
*** 
dsf *** 
d s f d s f ***

结果：

$str[] = 'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'I live at school where My address is dsfdsf@hotmail.com'; 
$str[] = 'I live at school. My address is dsfdsf@hotmail.com'; 
$str[] = 'at school my address is dsfdsf@hotmail.com'; 
$str[] = 'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'd s f d s f a t h o t m a i l . c o m';

function clean_text($text){
    $pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU'; 
    return preg_replace($pattern, '***', $text); 
}

foreach($str as $email){ 
     echo clean_text($email); 
}

dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live *** 
I live *** 
at school my address is dsfdsf****
dsf *** 
d s f d s f ***

问题： 它捕获第一次出现的“at”，而不是最后一次，因此发生以下情况：

input: 'at school my address is dsfdsf@hotmail.com'
produces: '****'
should produce: 'at school my address is dsfdsf****'

我怎样才能解决这个问题

function clean_text($text){
    $pattern = '/\w+[\w-\.]*(\@\w+((-\w+)|(\w*))\.[a-z]{2,3})/i';
    preg_match($pattern, $text, $matches);

    return (isset($matches[1])) ? str_replace($matches[1], "****", $text) : $text;
}

唯一不匹配的是你的最后一个，但你明白了

这是一个Perl脚本，可以适应php吗

my @l = (
'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com',
'I live at school where My address is dsfdsf@hotmail.com',
'I live at school. My address is dsfdsf@hotmail.com',
'at school my address is dsfdsf@hotmail.com',
'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com',
'd s f d s f a t h o t m a i l . c o m'
);

foreach(@l) {
   s/(\@|a[_. -]*t)[\w .-]*?$/****/;
   print $_,"\n";
}

输出：

dsfatasdfasdf asd dsfasdf dsfdsf****
I live at school where My address is dsfdsf****
I live at school. My address is dsfdsf****
at school my address is dsfdsf****
dsf a t asdfasdf asd dsfasdf dsfdsf****
d s f d s f ****

这是一个Perl脚本，可以适应php吗

my @l = (
'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com',
'I live at school where My address is dsfdsf@hotmail.com',
'I live at school. My address is dsfdsf@hotmail.com',
'at school my address is dsfdsf@hotmail.com',
'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com',
'd s f d s f a t h o t m a i l . c o m'
);

foreach(@l) {
   s/(\@|a[_. -]*t)[\w .-]*?$/****/;
   print $_,"\n";
}

输出：

dsfatasdfasdf asd dsfasdf dsfdsf****
I live at school where My address is dsfdsf****
I live at school. My address is dsfdsf****
at school my address is dsfdsf****
dsf a t asdfasdf asd dsfasdf dsfdsf****
d s f d s f ****

基于M42的正则表达式：

代码：

输出：

Username: dsfatasdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: I live at school where My address is dsfdsf, Domain: @hotmail.com
Username: I live at school. My address is dsfdsf, Domain: @hotmail.com
Username: at school my address is dsfdsf, Domain: @hotmail.com
Username: dsf a t asdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: d s f d s f , Domain: a t h o t m a i l . c o m

基于M42的正则表达式：

代码：

输出：

Username: dsfatasdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: I live at school where My address is dsfdsf, Domain: @hotmail.com
Username: I live at school. My address is dsfdsf, Domain: @hotmail.com
Username: at school my address is dsfdsf, Domain: @hotmail.com
Username: dsf a t asdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: d s f d s f , Domain: a t h o t m a i l . c o m

我想最明显的问题是为什么地址会被混淆？如果这是用户故意的，那么你提出的任何方案都会得到解决。@Douglas，没关系。即使他们能解决这个问题，我还是不鼓励他们。尝试用regexp解析自然语言并不是最好的主意，只是有太多的细微差别会破坏你用regexp构建的任何东西。这就像用regexp解析HTML一样。@HolyVieR，我不是在解析自然语言。我试图解析一个非常特殊的模式。这与试图用正则表达式解析HTML完全不同，这是错误的，原因完全不同。谢谢你的意见。@Colin，我打算永远使用它，因为垃圾邮件发送者在我的网站上用它来欺骗老太太。让我们专注于手头的问题，把你的道德观或想象中的可用性问题抛在脑后吧？我想最明显的问题是为什么地址会被混淆？如果这是用户故意的，那么你提出的任何方案都会得到解决。@Douglas，没关系。即使他们能解决这个问题，我还是不鼓励他们。尝试用regexp解析自然语言并不是最好的主意，只是有太多的细微差别会破坏你用regexp构建的任何东西。这就像用regexp解析HTML一样。@HolyVieR，我不是在解析自然语言。我试图解析一个非常特殊的模式。这与试图用正则表达式解析HTML完全不同，这是错误的，原因完全不同。谢谢你的意见。@Colin，我打算永远使用它，因为垃圾邮件发送者在我的网站上用它来欺骗老太太。让我们专注于手头的问题，把你的道德观或想象中的可用性问题抛在脑后吧？这很好，但我确实需要抓住“at”的情况，通常是“at”的情况（a，任何非aplanum，t）。感谢您到目前为止所做的努力。重点不是直接使用preg_替换，而是实际使用preg_匹配，然后在电子邮件的第二部分使用匹配的索引替换。所以这至少会让你朝着正确的方向前进。这很好，但我确实需要特别关注“at”的情况，通常是“at”的情况（a，任何非aplanum，t）。感谢您到目前为止所做的努力。重点不是直接使用preg_替换，而是实际使用preg_匹配，然后在电子邮件的第二部分使用匹配的索引替换。所以这至少会让你朝着正确的方向前进。