C# c中带百分比字符串的模糊匹配#_C#_Regex_String

C# c中带百分比字符串的模糊匹配#

c# regex string

C# c中带百分比字符串的模糊匹配#,c#,regex,string,C#,Regex,String,我的问题是假设我有一个字符串： “敏捷的棕色狐狸跳过懒狗”，共有8个字我还有一些其他的字符串，我必须和上面的字符串进行比较这些字符串是：这是与上面的字符串不匹配的字符串棕色狐狸跳得很快棕色的狐狸跳过懒惰的狐狸敏捷的棕色狐狸越过了狗狐狸跳过了那只懒狗跳到树上懒狗例如，用户给出的阈值（匹配字符串的百分比）为60% 也就是说 =8*60/100（此处8为上述字符串的总字数，60为阈值） =4.8 这意味着至少有4个单词应该匹配，这意味着结果应该是一致的棕色狐狸跳得很快敏捷的棕

我的问题是假设我有一个字符串：

“敏捷的棕色狐狸跳过懒狗”，共有8个字我还有一些其他的字符串，我必须和上面的字符串进行比较这些字符串是：

这是与上面的字符串不匹配的字符串

棕色狐狸跳得很快

棕色的狐狸跳过懒惰的狐狸

敏捷的棕色狐狸越过了狗

狐狸跳过了那只懒狗

跳到树上

懒狗

例如，用户给出的阈值（匹配字符串的百分比）为60% 也就是说

=8*60/100（此处8为上述字符串的总字数，60为阈值）

=4.8

这意味着至少有4个单词应该匹配，这意味着结果应该是一致的

棕色狐狸跳得很快

敏捷的棕色狐狸越过了狗

棕色的狐狸跳过懒惰的狐狸

狐狸跳过了那只懒狗

如何在c中进行模糊匹配？请帮助我。

正则表达式模式应该是这样的

(\bWord1\b|\bWord2\b|\bWord3\b|\betc\b)

然后你只需计算匹配项，并将其与单词数进行比较

string sentence = "Quick Brown Fox Jumps over the lazy dog";
string[] words = sentence.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
Regex regex = new Regex("(" + string.Join("|", words.Select(x => @"\b" + x + @"\b"))) + ")", RegexOptions.IgnoreCase);


string input = "Quick Brown fox Jumps";
int threshold = 60;

var matches = regex.Matches(input);

bool isMatch = words.Length*threshold/100 <= matches.Count;

Console.WriteLine(isMatch);

string-sense=“敏捷的棕色狐狸跳过了懒狗”；
string[]words=句子.Split（新的[]{''}，StringSplitOptions.RemoveEmptyEntries）；
Regex Regex=newregex（（“+string.Join（|）”，words.Select（x=>@“\b”+x+@“\b”）+”），RegexOptions.IgnoreCase）；
字符串输入=“快速棕色狐狸跳跃”；
int阈值=60；
var matches=regex.matches（输入）；
bool isMatch=words.Length*threshold/100正则表达式模式应该是这样的
(\bWord1\b|\bWord2\b|\bWord3\b|\betc\b)

然后你只需计算匹配项，并将其与单词数进行比较
string sentence = "Quick Brown Fox Jumps over the lazy dog";
string[] words = sentence.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
Regex regex = new Regex("(" + string.Join("|", words.Select(x => @"\b" + x + @"\b"))) + ")", RegexOptions.IgnoreCase);


string input = "Quick Brown fox Jumps";
int threshold = 60;

var matches = regex.Matches(input);

bool isMatch = words.Length*threshold/100 <= matches.Count;

Console.WriteLine(isMatch);

string-sense=“敏捷的棕色狐狸跳过了懒狗”；
string[]words=句子.Split（新的[]{''}，StringSplitOptions.RemoveEmptyEntries）；
Regex Regex=newregex（（“+string.Join（|）”，words.Select（x=>@“\b”+x+@“\b”）+”），RegexOptions.IgnoreCase）；
字符串输入=“快速棕色狐狸跳跃”；
int阈值=60；
var matches=regex.matches（输入）；
bool isMatch=words.Length*threshold/100我建议比较字典，而不是字符串：
如果句子中有相同的词，例如“狐狸跳过狗”怎么办
标点符号：句号、逗号等
比如说，“狐狸”，“狐狸”，“狐狸”等等
那么实施呢,
public static Dictionary<String, int> WordsToCounts(String value) {
  if (String.IsNullOrEmpty(value))
    return new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

  return value
    .Split(' ', '\r', '\n', '\t')
    .Select(item => item.Trim(',', '.', '?', '!', ':', ';', '"'))
    .Where(item => !String.IsNullOrEmpty(item))
    .GroupBy(item => item, StringComparer.OrdinalIgnoreCase)
    .ToDictionary(chunk => chunk.Key, 
                  chunk => chunk.Count(), 
                  StringComparer.OrdinalIgnoreCase);
}

public static Double DictionaryPercentage(
  IDictionary<String, int> left,
  IDictionary<String, int> right) {

  if (null == left)
    if (null == right)
      return 1.0;
    else
      return 0.0;
  else if (null == right)
    return 0.0;

  int all = left.Sum(pair => pair.Value);

  if (all <= 0)
    return 0.0;

  double found = 0.0;

  foreach (var pair in left) {
    int count;

    if (!right.TryGetValue(pair.Key, out count))
      count = 0;

    found += count < pair.Value ? count : pair.Value;
  }

  return found / all;
}

public static Double StringPercentage(String left, String right) {
  return DictionaryPercentage(WordsToCounts(left), WordsToCounts(right));
}

报告是
  "This is un-match string with above string."   0.00%
  "Quick Brown fox Jumps."                      50.00%
  "brown fox jumps over the lazy."              75.00%
  "quick brown fox over the dog."               75.00%
  "fox jumps over the lazy dog."                75.00%
  "jumps over the."                             37.50%
  "lazy dog."                                   25.00%

我建议比较字典，而不是字符串：
如果句子中有相同的词，例如“狐狸跳过狗”怎么办
标点符号：句号、逗号等
比如说，“狐狸”，“狐狸”，“狐狸”等等
那么实施呢,
public static Dictionary<String, int> WordsToCounts(String value) {
  if (String.IsNullOrEmpty(value))
    return new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

  return value
    .Split(' ', '\r', '\n', '\t')
    .Select(item => item.Trim(',', '.', '?', '!', ':', ';', '"'))
    .Where(item => !String.IsNullOrEmpty(item))
    .GroupBy(item => item, StringComparer.OrdinalIgnoreCase)
    .ToDictionary(chunk => chunk.Key, 
                  chunk => chunk.Count(), 
                  StringComparer.OrdinalIgnoreCase);
}

public static Double DictionaryPercentage(
  IDictionary<String, int> left,
  IDictionary<String, int> right) {

  if (null == left)
    if (null == right)
      return 1.0;
    else
      return 0.0;
  else if (null == right)
    return 0.0;

  int all = left.Sum(pair => pair.Value);

  if (all <= 0)
    return 0.0;

  double found = 0.0;

  foreach (var pair in left) {
    int count;

    if (!right.TryGetValue(pair.Key, out count))
      count = 0;

    found += count < pair.Value ? count : pair.Value;
  }

  return found / all;
}

public static Double StringPercentage(String left, String right) {
  return DictionaryPercentage(WordsToCounts(left), WordsToCounts(right));
}

报告是
  "This is un-match string with above string."   0.00%
  "Quick Brown fox Jumps."                      50.00%
  "brown fox jumps over the lazy."              75.00%
  "quick brown fox over the dog."               75.00%
  "fox jumps over the lazy dog."                75.00%
  "jumps over the."                             37.50%
  "lazy dog."                                   25.00%

阈值为60，并且8个单词中只有4个必须匹配？这将使leed仅达到要求的60%以下的50%。也许即使是4.1的情况下，你也应该凑齐。但也许我没有正确理解阈值。是的，我们知道8个单词中有60%是4.8，我们将其限定为4个，并至少匹配4个单词。阈值为60，8个单词中只有4个必须匹配？这将使leed仅达到要求的60%以下的50%。也许即使是4.1的情况下，你也应该凑齐。但可能我没有正确理解阈值。是的，我们知道8个单词中有60%是4.8，我们将其限定为4个，并至少匹配4个单词。我的输入字符串是“快速棕色狐狸跳过懒狗”。在我的内存库中，有数千个字符串可供比较，结果应该来自内存库（数据库）根据阈值%ageMy匹配的字符串输入字符串是“快速棕色狐狸跳过懒狗”在我的内存库中有数千个字符串可供比较，结果应来自内存库（数据库）根据阈值%age匹配字符串非常感谢Dmitry Bychenko先生这对我来说似乎很有用：）：）Dmitry Bychenko有办法突出匹配的单词吗？@Dimitry Bychenko我在上写了一个类似的方法，也许你有兴趣看到其他实现：）我想发布它，但问题被搁置是因为某种原因，我真的不明白为什么我的问题被搁置？？：（）THOMASJAWOWSKI.com：你的解决方案将对问题的样本起作用；然而，你不考虑单词的数量（例如“狐狸跳过狗”）。你的解决方案是简洁的，然后是我的，但是当使用文本（=大串的字符串）时，通常最好把字典转换成单词包（尤其是原件）。然后再分析。毕竟你的实现很好，如果可以的话，我会给你+1。非常感谢你Dmitry Bychenko先生，看起来对我很有用：）：）Dmitry Bychenko有没有办法突出匹配的单词？@Dimitry Bychenko我在《也许你有兴趣看到其他实现》上写了一个类似的方法：）我想发布它，但由于某种原因我不太明白为什么我的问题被搁置了？？：（）THOMASJAWOWSKI.com：你的解决方案将对问题的样本起作用；然而，你不考虑单词的数量（例如“狐狸跳过狗”）。你的解决方案是简洁的，然后是我的，但是当使用文本（=大串的字符串）时，通常最好把字典转换成单词包（尤其是原件）。然后再分析。毕竟你的实现很好，如果可以的话，我会给你+1。