Java 查找给定集合的最长单词_Java_Algorithm_Data Structures

Java 查找给定集合的最长单词

java algorithm data-structures

Java 查找给定集合的最长单词,java,algorithm,data-structures,Java,Algorithm,Data Structures,这是一个谷歌面试问题，我在网上找到的大多数答案都使用HashMap或类似的数据结构。如果可能的话，我正在尝试使用Trie找到解决方案。有人能给我一些提示吗问题是：您将获得一本字典，其形式为每行包含一个单词的文件。例如： abacus deltoid gaff giraffe microphone reef qar 您还将收到一系列信件。例如： {a, e, f, f, g, i, r, q}. 任务是在字典中找到最长的单词，这些单词可以通过信件。例如，上述示例值的正确答

这是一个谷歌面试问题，我在网上找到的大多数答案都使用HashMap或类似的数据结构。如果可能的话，我正在尝试使用Trie找到解决方案。有人能给我一些提示吗

问题是：您将获得一本字典，其形式为每行包含一个单词的文件。例如：

abacus 
deltoid 
gaff 
giraffe 
microphone 
reef 
qar

您还将收到一系列信件。例如：

{a, e, f, f, g, i, r, q}.

任务是在字典中找到最长的单词，这些单词可以通过信件。例如，上述示例值的正确答案是“长颈鹿”。（注意 “reef”不是一个可能的答案，因为这组字母只包含一个“e”。）

最好使用Java实现。

Groovy（几乎是Java）：

保存字典的集合类型的选择与算法无关。如果要实现trie，这是一件事。否则，只需从适当的库中创建一个来保存数据。据我所知，Java和Groovy的标准库中都没有一个这样的库。

我怀疑基于Trie的实现不会非常节省空间，但它可以很好地并行化，因为你可以平行地进入树的所有分支，收集最深的节点，你可以用给定的字母集从每个顶部分支到达这些节点。最后，您只需收集所有最深的节点并选择最长的节点

我从这个算法开始（抱歉，只是伪代码），它不尝试并行化，只是使用普通的递归（和回溯）来查找最长的匹配：

TrieNode visitNode( TrieNode n, LetterCollection c )
{
    TreeNode deepestNode = n;
    for each Letter l in c:
        TrieNode childNode = n.getChildFor( l );

        if childNode:
            TreeNode deepestSubNode = visitNode( childNode, c.without( l ) );
            if deepestSubNode.stringLength > deepestNode.stringLength:
                deepestNode = deepestSubNode;
   return deepestNode;
}

也就是说，该函数应该从trie的根节点开始，使用整个给定的字母集合。对于集合中的每个字母，尝试查找子节点。如果有，则递归并从集合中删除该字母。在某一点上，您的信件集合将是空的（最好的情况是，所有信件都会被占用-您实际上可以立即退出，而无需继续遍历trie），或者将不再有任何剩余信件的子级-在这种情况下，您将删除节点本身，因为这是“最长匹配”

如果您更改了递归步骤，以便并行访问所有子级，收集结果，并选择最长的结果并返回该结果，则可以很好地实现并行化。

无Java代码。你可以自己弄清楚

假设我们需要做很多次，下面是我要做的：

首先，我将为字典中的每个单词创建“签名”，每个单词由26位组成，如果单词包含一个（或多个）字母实例，则设置位[letter]。这些签名可以编码为Java
```
int
```
然后创建一个映射，将签名映射到具有该签名的单词列表

要使用预计算的地图进行搜索，请执行以下操作：

为要查找单词的字母集创建签名
然后遍历映射的键，查找
```
（key&（~signature）=0）
```
中的键。这将为您提供一个简短的“可能”列表，其中不包含任何不在所需字母集中的字母
在短列表中反复搜索每个所需字母的正确编号的单词，记录最长的命中率

注:

虽然主要的搜索大约是

O（N）

关于字典中的单词数量，但是测试非常便宜

这种方法的优点是需要相对较小的内存数据结构，并且（很可能）具有良好的局部性。这可能有助于加快搜索速度

下面是一个加速上面的

O（N）

搜索步骤的想法

从上面的签名映射开始，为包含特定字母对的所有单词创建（预计算）派生映射；i、一个代表包含AB的单词，一个代表AC，一个代表BC。。。还有YZ。然后，如果您正在寻找包含（比如）P和Q的单词，您可以只扫描PQ导数映射。这将一步一步地减少O（N）大约

26^2

。。。以额外贴图的内存为代价

这可以扩展到3个或更多字母，但缺点是内存使用量激增

另一个可能的调整是（以某种方式）将初始字母对的选择偏向于不经常出现的字母/对。但这增加了前期开销，可能比搜索较短列表所节省的（平均）开销还要大。

免责声明：这不是trie解决方案，但我仍然认为这是一个值得探索的想法

创建某种哈希函数，只考虑单词中的字母，而不考虑字母的顺序（除了排列的情况外，不可能发生冲突）。例如，

ABCDD

和

DCBA

都生成相同的散列（但

ABCDD

不生成）。生成这样一个包含字典中每个单词的哈希表，使用链接来链接冲突（另一方面，除非您严格要求查找“所有”最长的单词，而不仅仅是一个，否则您可以删除冲突，这只是排列，并放弃整个链接）

现在，如果您的搜索集长度为4个字符，例如

A、B、C、D

，那么作为一种动态搜索，您可以检查以下哈希值，以查看它们是否已包含在词典中：

hash(A), hash(B), hash(C), hash(D) // 1-combinations
hash(AB), hash(AC), hash(AD), hash(BC), hash(BD), hash(CD) // 2-combinations
hash(ABC), hash(ABD), hash(ACD), hash(BCD) // 3-combinations
hash(ABCD) // 4-combinations

如果按该顺序搜索哈希，最后找到的匹配项将是最长的匹配项

这最终会产生一个运行时间，它取决于搜索集的长度，而不是字典的长度。如果

是搜索集中的字符数，则哈希查找的数目是总和

M选择1+M选择2+M选择3+…+M选择M

，这也是搜索集的功率集的大小，因此

hash(A), hash(B), hash(C), hash(D) // 1-combinations
hash(AB), hash(AC), hash(AD), hash(BC), hash(BD), hash(CD) // 2-combinations
hash(ABC), hash(ABD), hash(ACD), hash(BCD) // 3-combinations
hash(ABCD) // 4-combinations

* a bacus d deltoid g a gaff i giraffe m microphone r reef q qar

public class LongestWord {

  class TrieNode {
    char value;
    List<TrieNode> children = new ArrayList<>();
    String word;

    public TrieNode() {
    }

    public TrieNode(char val) {
      this.value = val;
    }

    public void add(char[] array) {
      add(array, 0);
    }

    public void add(char[] array, int offset) {
      for (TrieNode child : children) {
        if (child.value == array[offset]) {
          child.add(array, offset + 1);
          return;
        }
      }
      TrieNode trieNode = new TrieNode(array[offset]);
      children.add(trieNode);
      if (offset < array.length - 1) {
        trieNode.add(array, offset + 1);
      } else {
        trieNode.word = new String(array);
      }
    }    
  }

  private TrieNode root = new TrieNode();

  public LongestWord() {
    List<String> asList = Arrays.asList("abacus", "deltoid", "gaff", "giraffe",
        "microphone", "reef", "qar");
    for (String word : asList) {
      root.add(word.toCharArray());
    }
  }

  public String search(char[] cs) {
    return visit(root, cs);
  }

  public String visit(TrieNode n, char[] allowedCharacters) {
    String bestMatch = null;
    if (n.children.isEmpty()) {
      // base case, leaf of the trie, use as a candidate
      bestMatch = n.word;
    }

    for (TrieNode child : n.children) {
      if (contains(allowedCharacters, child.value)) {
        // remove this child's value and descent into the trie
        String result = visit(child, remove(allowedCharacters, child.value));
        // if the result wasn't null, check length and set
        if (bestMatch == null || result != null
            && bestMatch.length() < result.length()) {
          bestMatch = result;
        }
      }
    }
    // always return the best known match thus far
    return bestMatch;
  }

  private char[] remove(char[] allowedCharacters, char value) {
    char[] newDict = new char[allowedCharacters.length - 1];
    int index = 0;
    for (char x : allowedCharacters) {
      if (x != value) {
        newDict[index++] = x;
      } else {
        // we removed the first hit, now copy the rest
        break;
      }
    }
    System.arraycopy(allowedCharacters, index + 1, newDict, index,
        allowedCharacters.length - (index + 1));

    return newDict;
  }

  private boolean contains(char[] allowedCharacters, char value) {
    for (char x : allowedCharacters) {
      if (value == x) {
        return true;
      }
    }
    return false;
  }

  public static void main(String[] args) {
    LongestWord lw = new LongestWord();
    String longestWord = lw.search(new char[] { 'a', 'e', 'f', 'f', 'g', 'i',
        'r', 'q' });
    // yields giraffe
    System.out.println(longestWord);
  }

}

#include "iostream"
#include <string>

using namespace std;

int hash_f(string s){
        int key=0;
        for(unsigned int i=0;i<s.size();i++){
           key += s[i];
        }
        return key;
}

class collection{

int key[100];
string str[10000];

public: 
collection(){
    str[hash_f( "abacus")] = "abacus"; 
    str[hash_f( "deltoid")] = "deltoid"; 
    str[hash_f( "gaff")] = "gaff"; 
    str[hash_f( "giraffe")] = "giraffe"; 
    str[hash_f( "microphone")] = "microphone"; 
    str[hash_f( "reef")] = "reef"; 
    str[hash_f( "qar")] = "qar"; 
}

string  find(int _key){
    return str[_key];
}
};

string sub_str(string s,int* indexes,int n ){
    char c[20];
    int i=0;
    for(;i<n;i++){
        c[i] = s[indexes[i]];
    }
    c[i] = 0;
    return string(c);
}

string* combination_m_n(string str , int m,int n , int& num){

    string* result = new string[100];
    int index = 0;

    int * indexes = (int*)malloc(sizeof(int)*n);

    for(int i=0;i<n;i++){
        indexes[i] = i; 
    }

    while(1){
            result[index++] = sub_str(str , indexes,n);
            bool reset = true;
            for(int i=n-1;i>0;i--)
            {
                if( ((i==n-1)&&indexes[i]<m-1) ||  (indexes[i]<indexes[i+1]-1))
                {
                    indexes[i]++;
                    for(int j=i+1;j<n;j++) 
                        indexes[j] = indexes[j-1] + 1;
                    reset = false;
                    break;
                }
            }
            if(reset){
                indexes[0]++;
                if(indexes[0] + n > m) 
                    break;
                for(int i=1;i<n;i++)
                    indexes[i] = indexes[0]+i;
            }
    }
    num = index;
    return result;
}


int main(int argc, char* argv[])
{
    string str = "aeffgirq";
    string* r;
    int num;

    collection c;
    for(int i=8;i>0;i--){
        r = combination_m_n(str, str.size(),i ,num);
        for(int i=0;i<num;i++){
            int key = hash_f(r[i]);
             string temp = c.find(key);
            if(  temp != "" ){
                  cout << temp ;
            }
        }
    }
}

[3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[6, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]
[7, 0, 0, 0, 2, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[4, 1, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[7, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[10, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[4, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[26, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

aba
b
ad
da
la
ma

aab
b
ad
ad
al
am

     root
     /  \
    a    b
 /-/|\-\
a b d l m
|
b