C# HTTPModule问题：替换页面呈现上的文本_C#_Obfuscation_Httpmodule_Spam Prevention

C# HTTPModule问题：替换页面呈现上的文本

C# HTTPModule问题：替换页面呈现上的文本,c#,obfuscation,httpmodule,spam-prevention,C#,Obfuscation,Httpmodule,Spam Prevention,我正在编写一个HTTPModule，它将搜索网页中的所有mailto链接，混淆电子邮件地址和尾随参数，然后将新混淆的字符串放回HTML文档中。然后，我使用一点JavaScript在浏览器中消除mailto链接的混淆，这样当用户单击链接时，它就会正常工作到目前为止，我已经成功地混淆和取消混淆了信息，没有任何问题。我遇到的问题是将模糊的字符串放回流中。如果一个mailto链接在文档中只出现一次，那么它会完美地将模糊的字符串放置在mailto链接的位置，但是如果有多个mailto链接，那么字符串的放

我正在编写一个HTTPModule，它将搜索网页中的所有mailto链接，混淆电子邮件地址和尾随参数，然后将新混淆的字符串放回HTML文档中。然后，我使用一点JavaScript在浏览器中消除mailto链接的混淆，这样当用户单击链接时，它就会正常工作

到目前为止，我已经成功地混淆和取消混淆了信息，没有任何问题。我遇到的问题是将模糊的字符串放回流中。如果一个mailto链接在文档中只出现一次，那么它会完美地将模糊的字符串放置在mailto链接的位置，但是如果有多个mailto链接，那么字符串的放置似乎是随机的。我很确定这与正则表达式匹配索引的位置有关，因为函数在匹配中循环，基本上增加了通过流的HTML的长度。我将在这里发布一些经过策略性编辑的代码，看看是否有人知道如何正确定位模糊字符串的位置

我还发布了我为混淆字符串所做的工作，希望它能帮助尝试做同样事情的人

public override void Write(byte[] buffer, int offset, int count)
  {
      byte[] data = new byte[count];
      Buffer.BlockCopy(buffer, offset, data, 0, count);
      string html = System.Text.Encoding.Default.GetString(buffer);

      //--- Work on the HTML from the page. We want to pass it through the 
      //--- obfusication function before it is sent to the browser.
      html = html.Replace(html, obfuscate(html));

      byte[] outdata = System.Text.Encoding.Default.GetBytes(html);
      _strmHTML.Write(outdata, 0, outdata.GetLength(0));
  }


protected string obfuscate(string input)
    {

      //--- Declarations
      string email = string.Empty;
      string obsEmail = string.Empty;
      string matchedEMail = string.Empty;
      int matchIndex = 0;
      int matchLength = 0;

      //--- This is a REGEX to grab any "a href=mailto" tags in the document.
      MatchCollection matches = Regex.Matches(input, @"<a href=""mailto:[a-zA-Z0-9\.,|\-|_@?= &]*"">", RegexOptions.Singleline | RegexOptions.IgnoreCase);

      //--- Because of the nature of doing a match search with regex, we must now loop through the results
      //--- of the MatchCollection.
        foreach (Match match in matches)
        {

            //--- Get the match string
            matchedEMail = match.ToString();
            matchIndex = match.Index;
            matchLength = match.Length;

            //--- Obfusicate the matched string.
            obsEmail = obfusucateEmail(@match.Value.ToString());

           //--- Reform the entire HTML stream. THis has to be added back in at the right point.
           input = input.Substring(0, matchIndex) + obsEmail + input.Substring(matchIndex + matchLength);                 
        }

      //--- Return the obfuscated result.
      return input;
    }



protected string obfusucateEmail(string input)
  {

      //--- Declarations
      string email = string.Empty;
      string obsEmail = string.Empty;

      //--- Reset these value, in case we find more than one match.
      email = string.Empty;
      obsEmail = string.Empty;

      //--- Get the email address out of the array
      email = @input;

      //--- Clean up the string. We need to get rid of the beginning of the tag, and the end >. First,
      //--- let's flush out all quotes.
      email = email.Replace("\"", "");

      //--- Now, let's replace the beginning of the tag.
      email = email.Replace("<a href=mailto:", "");

      //--- Finally, let's get rid of the closing tag.
      email = email.Replace(">", "");


      //--- Now, we have a cleaned mailto string. Let's obfusicate it.
      Array matcharray = email.ToCharArray();

      //--- Loop through the CharArray and encode each letter.
      foreach (char letter in matcharray)
      {
          //Convert each letter of the address to the corresponding ASCII code.
          //Add XX to each value to break the direct ASCII code to letter mapping. We'll deal
          // with subtracting XX from each number on the JavaScript side.
          obsEmail += Convert.ToInt32((letter) + 42).ToString() + "~";
      }

      //--- Before we return the obfusicated value, we need to reform the tag.
      //--- Remember, up above, we stripped all this out. Well now, we need 
      //--- to add it again.
      obsEmail = "<a href=\"mailto:" + obsEmail + "\">";

      return obsEmail;
  }

公共重写无效写入（字节[]缓冲区、整数偏移量、整数计数）
{
字节[]数据=新字节[计数]；
块复制（缓冲区、偏移量、数据、0、计数）；
字符串html=System.Text.Encoding.Default.GetString（缓冲区）；
//---从页面处理HTML。我们希望通过
//---在发送到浏览器之前具有obfusionation功能。
html=html.Replace（html，obfuscate（html））；
byte[]outdata=System.Text.Encoding.Default.GetBytes（html）；
_Write（outdata，0，outdata.GetLength（0））；
}
受保护的字符串混淆（字符串输入）
{
//---声明
string email=string.Empty；
string obsEmail=string.Empty；
string matchedEMail=string.Empty；
int matchIndex=0；
int matchLength=0；
//---这是一个正则表达式，用于获取文档中的任何“a href=mailto”标记。
MatchCollection matches=Regex.matches（输入@“”，RegexOptions.Singleline | RegexOptions.IgnoreCase）；
//---由于使用正则表达式进行匹配搜索的性质，我们现在必须循环搜索结果
//---火柴系列的一部分。
foreach（匹配中的匹配）
{
//---获取匹配字符串
matchedEMail=match.ToString（）；
matchIndex=match.Index；
matchLength=match.Length；
//---使匹配的字符串对象化。
obsEmail=obfusucatemail（@match.Value.ToString（））；
//---改革整个HTML流。必须在正确的位置重新添加。
input=input.Substring（0，matchIndex）+obsEmail+input.Substring（matchIndex+matchLength）；
}
//---返回模糊化的结果。
返回输入；
}
受保护的字符串模糊处理邮件（字符串输入）
{
//---声明
string email=string.Empty；
string obsEmail=string.Empty；
//---重置这些值，以防找到多个匹配项。
email=string.Empty；
obsEmail=string.Empty；
//---从数组中获取电子邮件地址
email=@input；
//---清理字符串。我们需要去掉标记的开头和结尾>。首先，
//---让我们把所有的引语都删掉。
email=email.Replace（“\”，”）；
//---现在，让我们替换标记的开头。
email=email.Replace（“，”）；
//---现在，我们有了一个干净的mailto字符串。让我们把它混淆。
数组matcharray=email.ToCharArray（）；
//---循环字符并对每个字母进行编码。
foreach（匹配数组中的字符）
{
//将地址的每个字母转换为相应的ASCII码。
//在每个值上加上XX，以打破ASCII码到字母的直接映射。我们将处理
//从JavaScript端的每个数字中减去XX。
obsEmail+=Convert.ToInt32（（字母）+42.ToString（）+“~”；
}
//---在返回obfusioned值之前，我们需要重新构造标记。
//---记住，在上面，我们把这些都去掉了。现在，我们需要
//---再加一次。
obsEmail=“”；
回复电子邮件；
}

我很感激任何想法

谢谢，

迈克

< P>根据您的性能需求（取决于您的文档大小等），您可能会考虑使用正则表达式来解析和操作HTML。您可以使用LINQ to Objor或XPath来标识所有的Melto标签。

您应该能够修改以下示例（从）以查找mailto标记：

HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    if (att.Value.StartsWith("mailto:") EncryptValue(att);
 }
 doc.Save("file.htm");

取决于您的性能需求（取决于您的文档大小等），您可以考虑使用正则表达式来解析和操作HTML。您可以使用LINQ to Objor或XPath来标识所有的Melto标签。

您应该能够修改以下示例（从）以查找mailto标记：

HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    if (att.Value.StartsWith("mailto:") EncryptValue(att);
 }
 doc.Save("file.htm");

您可以做的另一件事是在正则表达式中使用匹配计算器

protected string ObfuscateUsingMatchEvaluator(string input)
{
            var re = new Regex(@"<a href=""mailto:[a-zA-Z0-9\.,|\-|_@?= &]*"">",            RegexOptions.IgnoreCase | RegexOptions.Multiline);
            return re.Replace(input, DoObfuscation);

}

protected string DoObfuscation(Match match)
{
       return obfusucateEmail(match.Value);
}

使用MatchEvaluator（字符串输入）对受保护的字符串进行模糊处理
{
var re=新正则表达式（@“”，RegexOptions.IgnoreCase| RegexOptions.Multiline）；
返回、重新替换（输入、销毁）；
}
受保护的字符串doobfousation（匹配）
{
返回obfusucateEmail（match.Value）；
}

您可以做的另一件事是在正则表达式中使用匹配计算器

protected string ObfuscateUsingMatchEvaluator(string input)
{
            var re = new Regex(@"<a href=""mailto:[a-zA-Z0-9\.,|\-|_@?= &]*"">",            RegexOptions.IgnoreCase | RegexOptions.Multiline);
            return re.Replace(input, DoObfuscation);

}

protected string DoObfuscation(Match match)
{
       return obfusucateEmail(match.Value);
}

使用MatchEvaluator（字符串输入）对受保护的字符串进行模糊处理
{
var re=新正则表达式（@“”，RegexOptions.IgnoreCase| RegexOptions.Multiline）；
返回、重新替换（输入、销毁）；
}
受保护的字符串doobfousation（匹配）
{
返回模糊处理