如何在PHP中捕获以下模糊的电子邮件地址?

如何在PHP中捕获以下模糊的电子邮件地址?,php,regex,obfuscation,email-validation,Php,Regex,Obfuscation,Email Validation,考虑以下脚本,该脚本包含模糊的电子邮件地址,以及一个函数,该函数尝试使用正则表达式模式匹配将这些地址替换为***。我的脚本试图捕捉以下单词:“at”、“a t”、“a.t”、“@”,后跟一些文本(任何域名),后跟“dot”。“d.o.t”,后跟TLD 输入: $str[] = 'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com'; $str[] = 'I live at school where My address is dsfdsf@hotmail.

考虑以下脚本,该脚本包含模糊的电子邮件地址,以及一个函数,该函数尝试使用正则表达式模式匹配将这些地址替换为
***
。我的脚本试图捕捉以下单词:
“at”、“a t”、“a.t”、“@”
,后跟一些文本(任何域名),后跟
“dot”。“d.o.t”
,后跟TLD

输入:

$str[] = 'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'I live at school where My address is dsfdsf@hotmail.com'; 
$str[] = 'I live at school. My address is dsfdsf@hotmail.com'; 
$str[] = 'at school my address is dsfdsf@hotmail.com'; 
$str[] = 'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'd s f d s f a t h o t m a i l . c o m';

function clean_text($text){
    $pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU'; 
    return preg_replace($pattern, '***', $text); 
}

foreach($str as $email){ 
     echo clean_text($email); 
}
dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live *** 
I live *** 
at school my address is dsfdsf****
dsf *** 
d s f d s f *** 
预期输出:

dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live at school where My address is dsfdsf@***
I live at school. My address is dsfdsf@***
*** 
dsf *** 
d s f d s f *** 
结果:

$str[] = 'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'I live at school where My address is dsfdsf@hotmail.com'; 
$str[] = 'I live at school. My address is dsfdsf@hotmail.com'; 
$str[] = 'at school my address is dsfdsf@hotmail.com'; 
$str[] = 'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com'; 
$str[] = 'd s f d s f a t h o t m a i l . c o m';

function clean_text($text){
    $pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU'; 
    return preg_replace($pattern, '***', $text); 
}

foreach($str as $email){ 
     echo clean_text($email); 
}
dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live *** 
I live *** 
at school my address is dsfdsf****
dsf *** 
d s f d s f *** 
问题: 它捕获第一次出现的“at”,而不是最后一次,因此发生以下情况:

input: 'at school my address is dsfdsf@hotmail.com'
produces: '****'
should produce: 'at school my address is dsfdsf****'
我怎样才能解决这个问题

function clean_text($text){
    $pattern = '/\w+[\w-\.]*(\@\w+((-\w+)|(\w*))\.[a-z]{2,3})/i';
    preg_match($pattern, $text, $matches);

    return (isset($matches[1])) ? str_replace($matches[1], "****", $text) : $text;
}
唯一不匹配的是你的最后一个,但你明白了


唯一不匹配的是你的最后一个,但你明白了

这是一个Perl脚本,可以适应php吗

my @l = (
'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com',
'I live at school where My address is dsfdsf@hotmail.com',
'I live at school. My address is dsfdsf@hotmail.com',
'at school my address is dsfdsf@hotmail.com',
'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com',
'd s f d s f a t h o t m a i l . c o m'
);

foreach(@l) {
   s/(\@|a[_. -]*t)[\w .-]*?$/****/;
   print $_,"\n";
}
输出:

dsfatasdfasdf asd dsfasdf dsfdsf****
I live at school where My address is dsfdsf****
I live at school. My address is dsfdsf****
at school my address is dsfdsf****
dsf a t asdfasdf asd dsfasdf dsfdsf****
d s f d s f ****

这是一个Perl脚本,可以适应php吗

my @l = (
'dsfatasdfasdf asd dsfasdf dsfdsf@hotmail.com',
'I live at school where My address is dsfdsf@hotmail.com',
'I live at school. My address is dsfdsf@hotmail.com',
'at school my address is dsfdsf@hotmail.com',
'dsf a t asdfasdf asd dsfasdf dsfdsf@hotmail.com',
'd s f d s f a t h o t m a i l . c o m'
);

foreach(@l) {
   s/(\@|a[_. -]*t)[\w .-]*?$/****/;
   print $_,"\n";
}
输出:

dsfatasdfasdf asd dsfasdf dsfdsf****
I live at school where My address is dsfdsf****
I live at school. My address is dsfdsf****
at school my address is dsfdsf****
dsf a t asdfasdf asd dsfasdf dsfdsf****
d s f d s f ****
基于M42的正则表达式:

代码:

输出:

Username: dsfatasdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: I live at school where My address is dsfdsf, Domain: @hotmail.com
Username: I live at school. My address is dsfdsf, Domain: @hotmail.com
Username: at school my address is dsfdsf, Domain: @hotmail.com
Username: dsf a t asdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: d s f d s f , Domain: a t h o t m a i l . c o m
基于M42的正则表达式:

代码:

输出:

Username: dsfatasdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: I live at school where My address is dsfdsf, Domain: @hotmail.com
Username: I live at school. My address is dsfdsf, Domain: @hotmail.com
Username: at school my address is dsfdsf, Domain: @hotmail.com
Username: dsf a t asdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com
Username: d s f d s f , Domain: a t h o t m a i l . c o m

我想最明显的问题是为什么地址会被混淆?如果这是用户故意的,那么你提出的任何方案都会得到解决。@Douglas,没关系。即使他们能解决这个问题,我还是不鼓励他们。尝试用regexp解析自然语言并不是最好的主意,只是有太多的细微差别会破坏你用regexp构建的任何东西。这就像用regexp解析HTML一样。@HolyVieR,我不是在解析自然语言。我试图解析一个非常特殊的模式。这与试图用正则表达式解析HTML完全不同,这是错误的,原因完全不同。谢谢你的意见。@Colin,我打算永远使用它,因为垃圾邮件发送者在我的网站上用它来欺骗老太太。让我们专注于手头的问题,把你的道德观或想象中的可用性问题抛在脑后吧?我想最明显的问题是为什么地址会被混淆?如果这是用户故意的,那么你提出的任何方案都会得到解决。@Douglas,没关系。即使他们能解决这个问题,我还是不鼓励他们。尝试用regexp解析自然语言并不是最好的主意,只是有太多的细微差别会破坏你用regexp构建的任何东西。这就像用regexp解析HTML一样。@HolyVieR,我不是在解析自然语言。我试图解析一个非常特殊的模式。这与试图用正则表达式解析HTML完全不同,这是错误的,原因完全不同。谢谢你的意见。@Colin,我打算永远使用它,因为垃圾邮件发送者在我的网站上用它来欺骗老太太。让我们专注于手头的问题,把你的道德观或想象中的可用性问题抛在脑后吧?这很好,但我确实需要抓住“at”的情况,通常是“at”的情况(a,任何非aplanum,t)。感谢您到目前为止所做的努力。重点不是直接使用preg_替换,而是实际使用preg_匹配,然后在电子邮件的第二部分使用匹配的索引替换。所以这至少会让你朝着正确的方向前进。这很好,但我确实需要特别关注“at”的情况,通常是“at”的情况(a,任何非aplanum,t)。感谢您到目前为止所做的努力。重点不是直接使用preg_替换,而是实际使用preg_匹配,然后在电子邮件的第二部分使用匹配的索引替换。所以这至少会让你朝着正确的方向前进。