Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/algorithm/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Algorithm 计算传入字符流中某个单词的出现次数_Algorithm_Data Structures - Fatal编程技术网

Algorithm 计算传入字符流中某个单词的出现次数

Algorithm 计算传入字符流中某个单词的出现次数,algorithm,data-structures,Algorithm,Data Structures,我在一次采访中被问到这个问题,虽然我擅长DS&Algo,但这个问题我无法解决。不管怎么说,这是一个有趣的问题,所以请发布它 问题:您有一个传入的字符流,需要计算单词的出现次数。您只能从流中读取一个API,即stream.next_char(),如果没有API,则返回“\0” int count_occurrences(Stream stream, String word) { // you have only one function provided from Stream class tha

我在一次采访中被问到这个问题,虽然我擅长DS&Algo,但这个问题我无法解决。不管怎么说,这是一个有趣的问题,所以请发布它

问题:您有一个传入的字符流,需要计算单词的出现次数。您只能从流中读取一个API,即stream.next_char(),如果没有API,则返回“\0”

int count_occurrences(Stream stream, String word) {
// you have only one function provided from Stream class that you can use to 
// read one char at a time, no length/size etc.
// stream.next_char() - return "\0" if end
}
输入:“aabckjhabcc” 单词:“abc”
输出:2

最简单的解决方案是使用最多包含word.length()符号的缓冲区:


复杂性是O(N*M),内存是O(M)

可能是这样的

int count_occurrences(Stream stream, String word) {
    // you have only one function provided from Stream class that you can use to 
    // read one char at a time, no length/size etc.
    // stream.next_char() - return "\0" if end

    List<int> positions = new List<int>();

    int counter = 0;
    while (true) {
        char ch = stream.next_char();
        if (ch == '\0') return counter;

        if (ch == word.charAt(0)) {
            positions.add(0);
        }

        int i = 0;
        while (i < positions.length) {
            int pos = positions[i];

            if (word.charAt(pos) != ch) {
                positions.remove(i);
                continue;
            }

            pos++;
            if (pos == word.length()) {
                positions.remove(i);
                counter++;
                continue;
            }

            positions[i] = pos;
            i++;
        }
    }
}
int count\u出现次数(流、字符串字){
//流类只提供了一个函数,可以用来
//一次读取一个字符,无长度/大小等。
//stream.next\u char()-如果结束,则返回“\0”
列表位置=新列表();
int计数器=0;
while(true){
char ch=stream.next_char();
if(ch=='\0')返回计数器;
if(ch==word.charAt(0)){
位置。添加(0);
}
int i=0;
而(i
他们要找的(可能)不是拉宾·卡普就是克努斯·莫里斯·普拉特。两者都需要一次传球,开销非常小。如果模式很大,他们将在速度方面取得明显的胜利,因为复杂性是
O(流长度)

Rabbin Karp依赖于可以在O(1)中为下一个字符更新的哈希。如果散列不是很好,或者流很长(散列冲突),可能会给您带来误报

Knuth Morris Pratt重新测试计算最长前缀的长度,该前缀也是模式中每个位置的后缀。这需要O(n)内存来存储这些结果,但仅此而已


请在wikipedia的字符串模式匹配下查找更多细节和实现。

我认为这个问题与 使用有限状态计算模型匹配字符串

这个问题可以通过使用KMP字符串来解决 匹配算法

KMP算法尝试在模式的文本字符串中查找匹配项 字符串,考虑模式的前缀有多少 即使我们在某个点上发现不匹配,仍然匹配

用于确定“仍可以匹配多少前缀”,如果 在模式中匹配到索引i后,我们遇到不匹配, 故障函数是预先建立的。(请参考以下代码。) 用于建立故障函数值)

该故障函数将告知模式的每个索引i, 即使 我们在索引i之后遇到了不匹配

这样做的目的是找出模式的最长正确前缀的长度 这也是由1到i表示的模式的每个子串的后缀 指数,其中i的范围为1到n

我使用字符串索引从1开始

因此,任何模式的第一个字符的故障函数值 是0。(即到目前为止没有匹配的字符)

对于后续字符,对于每个索引i=2到n,我们看到 最长的长度是多少 模式[1…i]的子字符串的正确前缀,它也是 模式[1…i]的子字符串的后缀

假设我们的模式是“aac”,那么 索引1为0(尚未匹配),且故障函数值 对于索引2,其长度为1,(最长的正确前缀的长度与 “aa”的最长正确后缀为1)

对于模式“ababac”,索引1的故障函数值为0, 索引2为0,索引3为1(因为第三个索引“a”与 指数4的第一个指数“a”)是2(因为指数1和2的“ab”是相同的 指数3和4中的“ab”,指数5中的“aba”为3(“aba”在指数[1…3]中) 与指数[3…5]中的“aba”相同。对于索引6,故障函数值为0

下面是构建故障函数和匹配的代码(C++) 使用它的文本(或流):

/* Assuming that string indices start from 1 for both pattern and text. */
/* Firstly build the failure function. */
int s = 1;
int t = 0;  

/* n denotes the length of the pattern */
int *f = new int[n+1];
f[1] = 0;   

for (s = 1; s < n; s++) {
    while (t > 0 && pattern[t + 1] != pattern[s + 1]) {
        t = f[t];
    }
    if (pattern[t + 1] == pattern[s + 1]) {
        t++;
        f[s + 1] = t;
    }
    else {
        f[s + 1] = 0;           
    }
}

/* Now start reading characters from the stream */
int count = 0;
char current_char = stream.next_char();

/* s denotes the index of pattern upto which we have found match in text */
/* initially its 0 i.e. no character has been matched yet. */
s = 0; 
while (current_char != '\0') {

    /* IF previously, we had matched upto a certain number of
       characters, and then failed, we return s to the point
       which is the longest prefix that still might be matched.

       (spaces between string are just for clarity)
       For e.g., if pattern is              "a  b  a  b  a  a" 
       & its failure returning index is     "0  0  1  2  3  1"

       and we encounter 
       the text like :      "x  y  z  a  b  a  b  a  b  a  a" 
              indices :      1  2  3  4  5  6  7  8  9  10 11

       after matching the substring "a  b  a  b  a", starting at
       index 4 of text, we are successful upto index 8  but we fail
       at index 9, the next character at index 9 of text is 'b'
       but in our pattern which should have been 'a'.Thus, the index
       in pattern which has been matched till now is 5 ( a  b  a  b  a)
                                                         1  2  3  4  5
       Now, we see that the failure returning index at index 5 of 
       pattern is 3, which means that the text is still matched upto
       index 3 of pattern (a  b  a), not from the initial starting 
       index 4 of text, but starting from index 6 of text.

       Thus we continue searching assuming that our next starting index
       in text is 6, eventually finding the match starting from index 6
       upto index 11.    

       */
        while (s > 0 && current_char != pattern[s + 1]) {
            s = f[s];
        }
        if (current_char == pattern[s + 1]) s++; /* We test the next character after the currently
                                                    matched portion of pattern with the current 
                                                    character of text , if it matches, we increase
                                                    the size of our matched portion by 1*/
        if (s == n) {
            count++;
        }
        current_char = stream.next_char();
}

printf("Count is %d\n", count);
/*假设模式和文本的字符串索引都从1开始*/
/*首先建立失效函数*/
int s=1;
int t=0;
/*n表示图案的长度*/
int*f=新的int[n+1];
f[1]=0;
对于(s=1;s0&&pattern[t+1]!=pattern[s+1]){
t=f[t];
}
如果(模式[t+1]==模式[s+1]){
t++;
f[s+1]=t;
}
否则{
f[s+1]=0;
}
}
/*现在开始从流中读取字符*/
整数计数=0;
char current_char=stream.next_char();
/*s表示我们在文本中找到匹配的模式索引*/
/*最初为0,即尚未匹配任何字符*/
s=0;
while(当前字符!='\0'){
/*如果之前我们匹配了一定数量的
字符,然后失败,我们返回到点
这是仍然可以匹配的最长前缀。
(字符串之间的空格仅为清晰起见)
例如,如果模式为“a b a a”
&其故障返回索引为“0 0 1 2 3 1”
我们遇到
文本如:“x y z a b a b a a b a a”
指数:12345678991011
匹配子字符串“a”后,从
文本的索引4,我们成功达到索引8,但我们失败了
在索引9处,文本索引9处的下一个字符是“b”
但在我们的模式中,应该是“a”。因此,索引
按部就班
int count_occurrences(Stream stream, String word) {
    // you have only one function provided from Stream class that you can use to 
    // read one char at a time, no length/size etc.
    // stream.next_char() - return "\0" if end

    List<int> positions = new List<int>();

    int counter = 0;
    while (true) {
        char ch = stream.next_char();
        if (ch == '\0') return counter;

        if (ch == word.charAt(0)) {
            positions.add(0);
        }

        int i = 0;
        while (i < positions.length) {
            int pos = positions[i];

            if (word.charAt(pos) != ch) {
                positions.remove(i);
                continue;
            }

            pos++;
            if (pos == word.length()) {
                positions.remove(i);
                counter++;
                continue;
            }

            positions[i] = pos;
            i++;
        }
    }
}
/* Assuming that string indices start from 1 for both pattern and text. */
/* Firstly build the failure function. */
int s = 1;
int t = 0;  

/* n denotes the length of the pattern */
int *f = new int[n+1];
f[1] = 0;   

for (s = 1; s < n; s++) {
    while (t > 0 && pattern[t + 1] != pattern[s + 1]) {
        t = f[t];
    }
    if (pattern[t + 1] == pattern[s + 1]) {
        t++;
        f[s + 1] = t;
    }
    else {
        f[s + 1] = 0;           
    }
}

/* Now start reading characters from the stream */
int count = 0;
char current_char = stream.next_char();

/* s denotes the index of pattern upto which we have found match in text */
/* initially its 0 i.e. no character has been matched yet. */
s = 0; 
while (current_char != '\0') {

    /* IF previously, we had matched upto a certain number of
       characters, and then failed, we return s to the point
       which is the longest prefix that still might be matched.

       (spaces between string are just for clarity)
       For e.g., if pattern is              "a  b  a  b  a  a" 
       & its failure returning index is     "0  0  1  2  3  1"

       and we encounter 
       the text like :      "x  y  z  a  b  a  b  a  b  a  a" 
              indices :      1  2  3  4  5  6  7  8  9  10 11

       after matching the substring "a  b  a  b  a", starting at
       index 4 of text, we are successful upto index 8  but we fail
       at index 9, the next character at index 9 of text is 'b'
       but in our pattern which should have been 'a'.Thus, the index
       in pattern which has been matched till now is 5 ( a  b  a  b  a)
                                                         1  2  3  4  5
       Now, we see that the failure returning index at index 5 of 
       pattern is 3, which means that the text is still matched upto
       index 3 of pattern (a  b  a), not from the initial starting 
       index 4 of text, but starting from index 6 of text.

       Thus we continue searching assuming that our next starting index
       in text is 6, eventually finding the match starting from index 6
       upto index 11.    

       */
        while (s > 0 && current_char != pattern[s + 1]) {
            s = f[s];
        }
        if (current_char == pattern[s + 1]) s++; /* We test the next character after the currently
                                                    matched portion of pattern with the current 
                                                    character of text , if it matches, we increase
                                                    the size of our matched portion by 1*/
        if (s == n) {
            count++;
        }
        current_char = stream.next_char();
}

printf("Count is %d\n", count);