Algorithm 算法-检查给定单词是否有单一类型_Algorithm_Hashmap

Algorithm 算法-检查给定单词是否有单一类型

algorithm

Algorithm 算法-检查给定单词是否有单一类型,algorithm,hashmap,Algorithm,Hashmap,给定一个字典列表和一个输入单词，如果输入单词有一个与字典中的词汇长度相同的打字错误，则返回true dictionary = ["apple", "testing", "computer"]; singleType(dictionary, "adple") // true singleType(dictionary, "addle") // false singleType(dictionary, "apple") // false singleType(dictionary, "apples"

给定一个字典列表和一个输入单词，如果输入单词有一个与字典中的词汇长度相同的打字错误，则返回true

dictionary = ["apple", "testing", "computer"];
singleType(dictionary, "adple") // true
singleType(dictionary, "addle") // false
singleType(dictionary, "apple") // false
singleType(dictionary, "apples") // false

我提出了一个在线性时间内运行的解决方案，如果我们忽略hashmap所需的预处理时间

O（k*26）=>O（k）

，其中

k=输入字的长度

我的线性解决方案是，将字典列表转换为哈希映射，其中键是单词，值是布尔值，然后循环输入单词中的每个字符，并用26个字母表中的1个替换每个字符，并检查它是否映射到哈希映射

但是他们说我可以比

O（k*26）

做得更好，但是怎么做呢？

你可以用包含一个拼写错误的单词的所有变体来扩展字典，但不是实际的拼写错误，你只需要在那里放一些“通配符”，比如

？

或

。然后，您可以检查（a）单词是否不在拼写正确的单词集中，以及（b）用相同的通配符替换单词中的任何字母，可以在一个拼写错误的单词集中找到该单词

Python中的示例：

字典=[“苹果”，“测试”，“计算机”] >>>通配符=λw:[w[：i]+“？”+w[i+1:]表示范围内的i（len（w））] >>>onetypo={x在字典中代表w，x在通配符（w）中代表x} >>>correct={w代表字典中的w} >>>word=“apxle” >>>单词不正确且任意（w在onetypo中表示w在通配符（word）中）真的

这将查找的复杂性降低到O（k），即字母数仍然是线性的，但没有高常数因子。但是，它确实会将字典放大一倍，相当于单词中的平均字母数。

听起来您在寻找模式1相对于字典条目的编辑距离。例如，如果模式为“adple”，而词典条目为“apple”，则编辑距离为1。您还有一个额外的约束，即模式的长度与字典条目的长度相同，但这很容易实现。

对于单个查找，我会按单词长度筛选字典，然后迭代单词，计算错误数，并在错误数大于1时退出每个单词

val dictionary = List ("affen", "ample", "apple", "appse", "ipple", "appl", "pple", "mapple", "apples")

@annotation.tailrec
def oneError (w1: String, w2:String, err: Int) : Boolean = w1.length match {
    case 0 => err == 1
    case _ => if (err > 1) false else {
        if (w1(0) == w2(0)) oneError (w1.substring (1),  w2.substring (1), err) else
        oneError (w1.substring (1),  w2.substring (1), err + 1)
    }
}

scala> dictionary.filter (_.length == 5).filter (s => oneError ("appxe", s, 0))
res5: List[String] = List(apple, appse)

为了处理较长的文本，我将对字典进行预处理，并将其拆分为地图（word.length->List（words））

对于高度冗余的自然语言，我将从文本中构建一组独特的单词，以便只查找每个单词一次

对于单词查找，最坏的情况是n次调用初始函数，n=max（dictionary.groupBy（w.length））

每个单词查找（单词长度大于1）至少需要2个步骤，直到失败，但大多数单词（假设没有病理输入和字典）只需要访问2个步骤。从剩下的部分中，大多数在经过3个步骤之后被排除在外，以此类推

下面是一个版本，显示了它看起来有多深：

def oneError (word: String) : Array[String] = {

    @tailrec
    def oneError (w1: String, w2:String, steps: Int, err: Int) : Boolean = w1.length match {
        case 0 => {print (s"($steps) "); err == 1}
        case _ => if (err > 1) {print (s"$steps "); false } else {
            if (w1(0) == w2(0)) oneError (w1.substring (1),  w2.substring (1), steps +1, err) else
            oneError (w1.substring (1),  w2.substring (1), steps + 1, err + 1)
        }
    }

    val d = dict (word.length)
    println (s"Info: ${d.length} words of same length")
    d.filter (entry => oneError (word, entry, 0, 0))
}

样本输出，已编辑：

scala> oneError ("fuck") 
Info: 3352 words of same length
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2  
2 2 2 2 2 2 2 2 (4) 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 (4) (4) 3 3 3 3 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 3 3 (4) (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 
3 3 3 3 3 3 (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 
3 (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) (4) 3 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 (4) (4) 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
3 3 3 (4) 3 3 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 (4) 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
res53: Array[String] = Array(Buck, Huck, Puck, buck, duck, funk, luck, muck, puck, suck, tuck, yuck)

也许你可以把所有的

？pple，a？ple，ap？le，app？e，appl？

都放到dict中，只需在dict中查找k就可以了，但是dict要大得多。你的时间复杂度不是要乘以字典大小的长度吗？因为你怎么知道它是第一个字符串，你必须为所有字符串运行k*26，对吗？@tobias_k我认为你的解决方案有效，Yeeee！！！太狡猾了，wait@tobias_k如果输入的单词是“apple”，它将返回true，即使没有输入错误。但是我想我也可以把“苹果”这个词和所有的“苹果”、“苹果”、“苹果”、“苹果”这个词一起添加到hashmap中，然后检查苹果这个词是否已经存在。你可以按照每个字长将字典拆分成一个字典，只搜索长度匹配的字典。所以，您建议计算字典中每个单词的编辑距离吗？你认为这会比OP的O（k*26）更快吗？