C# 使字符串检查器更高效

C# 使字符串检查器更高效,c#,regex,string,C#,Regex,String,我使用下面的代码检查一个字符串是否包含在另一个字符串中- foreach (string testrecord in testlist) { foreach (string realrecord in reallist) { if ((Regex.Replace(testrecord , "[^0-9a-zA-Z]+", "") .Contains(( Regex.Replace(r

我使用下面的代码检查一个字符串是否包含在另一个字符串中-

foreach (string testrecord in testlist)
{
   foreach (string realrecord in reallist)
   {         
      if ((Regex.Replace(testrecord , "[^0-9a-zA-Z]+", "")
                .Contains((
                    Regex.Replace(realrecord, "[^0-9a-zA-Z]+", ""))) 
          && 
           ((Regex.Replace(realrecord, "[^0-9a-zA-Z]+", "") != "") 
          && 
           ((Regex.Replace(realrecord, "[^0-9a-zA-Z]+", "").Length >= 4)))))
      {

         matchTextBox.AppendText("Match: " + testrecord + " & " + realrecord + Environment.NewLine);

      }
   }

}
然而,完成这项工作的运行时需要相当长的时间。由于我添加了特殊字符regex removation,运行时需要花费更长的时间,但是regex是绝对必需的


有没有更有效的方法来应用这个正则表达式?我试图将它添加到foreach字符串变量中,但是您不能像foreach循环中那样更改它们

我想知道您是在使用正则表达式来达到目的,而忽略了一个事实,即您也可以通过仅使用.Contains()方法来实现这一点,这样您的代码应该比以前更简单更快

foreach (string testrecord in testlist)
{
   foreach (string realrecord in reallist)
   {         
      if(testrecord.Contains(realrecord))
         {
         matchTextBox.AppendText("Match: " + testrecord + " & " + realrecord + Environment.NewLine);
         }
   }

}
优化版本:

// Do not put text into matchTextBox direct:
// it makes the control re-painting each time you change the text
// Instead, collect all the text into StringBuffer  
StringBuilder Sb = new StringBuilder(); 

// Pull out as much as you can from the inner loop,
// that's why I've changed the loops' order:
// first loop on reallist, then on testlist
foreach (string realrecord in reallist) {
  // Cache Regex.Replace result
  String realCleaned = Regex.Replace(realrecord, "[^0-9a-zA-Z]+", "");

  // Test as early as possible
  if (realCleaned.Length < 4)
    continue;

  // You don't need to test realCleaned != "";: realCleaned.Length < 4 is enough

  foreach (string testrecord in testlist) {
    // Cache Regex.Replace result: it's a little bit overshoot here, but if some
    // more tests are added it'll be helpful
    String testCleaned = Regex.Replace(testrecord, "[^0-9a-zA-Z]+", "");

    if (testCleaned.Contains(realCleaned))
      Sb.AppendLine("Match: " + testrecord + " & " + realrecord);
  }  
}

// At last matchTextBox.Text change
matchTextBox.AppendText(Sb.ToString());
//不要直接将文本放入matchTextBox:
//它使控件在每次更改文本时重新绘制
//相反,将所有文本收集到StringBuffer中
StringBuilder Sb=新的StringBuilder();
//尽可能多地从内环中拉出,
//这就是为什么我改变了循环的顺序:
//首先在reallist上循环,然后在testlist上循环
foreach(reallist中的字符串realrecord){
//缓存正则表达式。替换结果
字符串realCleaned=Regex.Replace(realrecord,[^0-9a-zA-Z]+,”);
//尽早测试
如果(实际长度<4)
继续;
//您不需要测试realCleaned!=“”;:realCleaned。长度<4就足够了
foreach(testlist中的字符串testrecord){
//替换结果:这里有点过冲,但如果
//增加了更多的测试,这将很有帮助
字符串testCleaned=Regex.Replace(testrecord,“[^0-9a-zA-Z]+”,”);
if(testCleaned.Contains(realCleaned))
Sb.追加行(“匹配:“+testrecord+”&“+realrecord”);
}  
}
//最后匹配文本框。文本更改
AppendText(Sb.ToString());

这应该快一点(每个
testrecord
执行一个正则表达式操作):


如果需要性能,可以自己实现字符串处理。据我所知,您所做的只是限制字符集。首先,您可能希望只运行一次
Regex.Replace(realrecord,[^0-9a-zA-Z]+,”)
,并将结果缓存在变量中,而不是每次迭代调用三次。
=“
似乎是
Length>=4
@O.R.Mapper的复制品。即使不考虑性能,他也应该这样做。现在是复制粘贴编程。@usr:True。当我们使用时,
Regex.Replace(testrecord,“[^0-9a-zA-Z]+”,“)
在内部循环的每次迭代中被调用一次,即使它的结果在内部循环中似乎没有任何变化,因此,它也可以在外部循环中调用一次。-1:在比较之前,显然需要使用正则表达式来删除非字母或数字的字符。首先运行30分钟,进行更改-14分钟,金星Dmitry,速度加倍!
var strippedRealList = reallist.Select(s => Regex.Replace(s, "[^0-9a-zA-Z]+", ""))
                               .Where(s => s.Length >= 4)
                               .ToArray();

foreach (string realrecord in reallist)
{
   strippedRealList.Where(s => realrecord.Contains(s))
                   .ToList()
                   .ForEach(s =>
                            matchTextBox.AppendText("Match: "
                                                  + s
                                                  + " & "
                                                  + realrecord
                                                  + Environment.NewLine));

}