C# Levenshtein距离的同义词

C# Levenshtein距离的同义词,c#,levenshtein-distance,synonym,C#,Levenshtein Distance,Synonym,这是我的代码: public void SearchWordSynonymsByLevenstein() { foreach (var eachWord in wordCounter) { foreach (var eachSecondWord in wordCounter) { if (eachWord.Key.Length > 3) { var score =

这是我的代码:

public void SearchWordSynonymsByLevenstein()
{
    foreach (var eachWord in wordCounter)
    {
        foreach (var eachSecondWord in wordCounter)
        {
            if (eachWord.Key.Length > 3)
            {
                var score = LevenshteinDistance.Compute(eachWord.Key, eachSecondWord.Key);
                if (score < 2)
                {
                    if(!wordSynonymsByLevenstein.Any(x => x.Value.ContainsKey(eachSecondWord.Key)))
                    {
                        if (!wordSynonymsByLevenstein.ContainsKey(eachWord.Key))
                        {
                            wordSynonymsByLevenstein.Add(eachWord.Key, new Dictionary<string, int> { { eachSecondWord.Key, eachSecondWord.Value } });
                        }
                        else
                        {
                            wordSynonymsByLevenstein[eachWord.Key].Add(eachSecondWord.Key, eachSecondWord.Value);
                        }
                    }
                }
            }
        }
    }
}
public void SearchWordSynonymsByLevenstein()
{
foreach(字计数器中的变量eachWord)
{
foreach(字计数器中的var eachSecondWord)
{
如果(每个字键长度>3)
{
var score=levenshteindication.Compute(eachWord.Key,eachSecondWord.Key);
如果(分数<2)
{
if(!wordSynonymsByLevenstein.Any(x=>x.Value.ContainsKey(eachSecondWord.Key)))
{
if(!wordSynonymsByLevenstein.ContainsKey(eachWord.Key))
{
添加(eachWord.Key,新字典{{{eachSecondWord.Key,eachSecondWord.Value});
}
其他的
{
wordSynonymsByLevenstein[eachWord.Key].Add(eachSecondWord.Key,eachSecondWord.Value);
}
}
}
}
}
}
}

我的
wordCounter
Dictionary
,其中键是我的每个单词,值是计算文档中有多少个单词。有点像一袋字。我必须从其他
eachSecondWord
中搜索每个单词的同义词。这种方法花费了太多的时间。时间呈指数增长。还有其他方法可以缩短时间吗

首先,我假设您不想在
wordSynonymsByLevenstein
集合中将单词与其自身关联。其次,通过比较单词的长度,你可以跳过那些你知道不符合<2分要求的单词

public void SearchWordSynonymsByLevenstein()
{
    foreach (var eachWord in wordCounter)
    {
        foreach (var eachSecondWord in wordCounter)
        {
            if (eachWord.Key == eachSecondWord.Key 
                || eachWord.Key.Length <= 3 
                || Math.Abs(eachWord.Key.Length - eachSecondWord.Key.Length) >= 2)
            {
                continue;
            }
            var score = LevenshteinDistance.Compute(eachWord.Key, eachSecondWord.Key);
            if (score >= 2)
            {
                continue;
            }

            if(!wordSynonymsByLevenstein.Any(x => x.Value.ContainsKey(eachSecondWord.Key)))
            {
                if (!wordSynonymsByLevenstein.ContainsKey(eachWord.Key))
                {
                    wordSynonymsByLevenstein.Add(eachWord.Key, new Dictionary<string, int> { { eachSecondWord.Key, eachSecondWord.Value } });
                }
                else
                {
                    wordSynonymsByLevenstein[eachWord.Key].Add(eachSecondWord.Key, eachSecondWord.Value);
                }
            }

        }
    }
}

这里我使用了
if(used.Add(eachSecondWord.Key))
,因为
Add
将返回
true
如果单词被添加,如果它已经在
哈希集中
中,那么
单词同义词bylevenstein
真的需要成为
字典吗?为什么不仅仅是一本
字典
?您可以使用它来查找“同义词”,然后转到
wordCounter
进行计数。谢谢,稍后我会这样做:
if(wordSynonymsByLevenstein.TryGetValue(eachMainWord,out-isThisWord)){foreach(var-eachWw-in-isThisWord){mainWordWithSynonyms.Add(eachWw.Key);fullCounted=fullCounted+eachWw.Value;}var distinctedWord=mainwordwithsynoyms.DistinctBy(x=>x).ToList();if(main-foundwords.Any(x=>distinctedWord.Any(y=>y==x))&compFoundWords.Any(x=>distinctedWord.Any(y=>y==x)){relationScore=relationScore+((double)1/(double)fullcountedequalword++}
So
wordSynonymsByLevenshtein
一定是这本
Dictionary
我要说的是,如果
wordSynonymsByLevenstein
是一本
字典,那么我想知道你为什么要
如果(!wordSynonymsByLevenstein.Any)(x=>x.Value.ContainsKey(eachSecondWord.Key))
。你是说你不想把一个词和另一个词联系起来。所以如果你有“foobar,foobak,foobal”。您想关联foobar->foobak,foobar->foobal,foobak->foobar,但不想关联以下foobak->foobal,foobal->foobar,foobal->foobak?这是假设你实际上不想要foobar->foobar,foobar->foobak,foobar->foobal,这就是你的代码目前所做的。感谢你的伟大提示:)这个
Math.Abs
真的很有帮助,并且可以减少时间。我把这本字典改成你所说的,并从
wordCounter
中获取计数值。谢谢:)
public void SearchWordSynonymsByLevenstein()
{
    var used = new HashSet<string>();
    foreach (var eachWord in wordCounter)
    {
        foreach (var eachSecondWord in wordCounter)
        {
            if (eachWord.Key == eachSecondWord.Key 
                || eachWord.Key.Length <= 3 
                || Math.Abs(eachWord.Key.Length - eachSecondWord.Key.Length) >= 2)
            {
                continue;
            }
            var score = LevenshteinDistance.Compute(eachWord.Key, eachSecondWord.Key);
            if (score >= 2)
            {
                continue;
            }

            if(used.Add(eachSecondWord.Key)))
            {
                if (!wordSynonymsByLevenstein.ContainsKey(eachWord.Key))
                {
                    wordSynonymsByLevenstein.Add(eachWord.Key, new Dictionary<string, int> { { eachSecondWord.Key, eachSecondWord.Value } });
                }
                else
                {
                    wordSynonymsByLevenstein[eachWord.Key].Add(eachSecondWord.Key, eachSecondWord.Value);
                }
            }

        }
    }
}