Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C查找包含s2中所有字母的字符串s1的最短子字符串_C_String_Substring - Fatal编程技术网

C查找包含s2中所有字母的字符串s1的最短子字符串

C查找包含s2中所有字母的字符串s1的最短子字符串,c,string,substring,C,String,Substring,用户输入字符串s1和s2。查找字符串s1中包含所有s2字母的最小子字符串。(如果有多个大小相同的,请查找出现的第一个) 输入示例: it is raining today dot 输出:tod 注意:我写了一个有效的代码,但是我花了太多的时间去理解和编写,而且因为我在测试中会有这样的例子,这并不好有更简单的方法吗? 我是如何做到的:我编写了两个函数,一个用于返回给定字符串的从索引I到索引j的子字符串,另一个用于检查子字符串是否包含所有字母。然后我使用嵌套for循环找到包含所有字母的最短子字符串

用户输入字符串
s1
s2
。查找字符串
s1
中包含所有
s2
字母的最小子字符串。(如果有多个大小相同的,请查找出现的第一个)

输入示例:

it is raining today
dot
输出:
tod

注意:我写了一个有效的代码,但是我花了太多的时间去理解和编写,而且因为我在测试中会有这样的例子,这并不好有更简单的方法吗?

我是如何做到的:我编写了两个函数,一个用于返回给定字符串的从索引
I
到索引
j
的子字符串,另一个用于检查子字符串是否包含所有字母。然后我使用嵌套for循环找到包含所有字母的最短子字符串

我的代码(工作):

#包括
#包括
#包括
#包括
常数int LEN=100;
char*子字符串(int beg、int end、int n、char str[n])
{
int resultLen=abs(end-beg),i;
char*result=malloc(sizeof(char)*resultLen);
对于(i=0;i
您的代码有一些缺点:

  • 您不应该使用
    get()
  • 您分配了大量内存,但从未释放它
  • 如果
    s2
    包含一个或多个
    +
    字符,则它不会按预期工作
C库中没有一个函数可以完成您所需要的功能

这个问题不是那么容易解决,我花了40分钟才得到一个我认为很坚固的实现,尽管不是很快。它不分配内存,但假设字节有8位

这是我的密码:

#include <stdio.h>
#include <string.h>

const char *minsubstr(const char *str, const char *set, size_t *plen) {
    size_t count[256] = { 0 }, cc[256];
    size_t i, n, set_len, best_len = -1;
    const char *best = NULL;
    for (i = 0; set[i]; i++) {
        unsigned char c = set[i];
        count[c]++;
    }
    set_len = i;
    if (set_len == 0) {
        best_len = 0;
        best = str;
    } else {
        for (; *str; str++) {
            if (count[(unsigned char)*str] == 0)
                continue;
            memcpy(cc, count, sizeof cc);
            for (i = n = 0; i < best_len && str[i]; i++) {
                unsigned char c = str[i];
                if (cc[c]) {
                    cc[c]--;
                    if (++n == set_len) {
                        if (best_len > i + 1) {
                            best_len = i + 1;
                            best = str;
                        }
                        break;
                    }
                }
            }
            if (!str[i]) {
                // no more matches
                break;
            }
        }
    }
    *plen = best_len;
    return best;
}

int main() {
    char s1[100], s2[100];
    const char *p;
    size_t len;

    if (fgets(s1, sizeof s1, stdin) && fgets(s2, sizeof s2, stdin)) {
        s1[strcspn(s1, "\n")] = '\0';  // strip the trailing newline
        s2[strcspn(s2, "\n")] = '\0';  // strip the trailing newline if any
        p = minsubstr(s1, s2, &len);
        if (p) {
            printf("%.*s\n", (int)len, p);
        } else {
            printf("no match\n");
        }
    }
    return 0;
}

也许可以通过使用排序字符串来避免嵌套的
for
循环和这些循环所花费的
O(n^2)
时间:

  • 按字典顺序排序
    s2
    (将此排序字符串称为:
    s2S
  • 通过在空格上拆分来标记
    s1
  • 循环通过
    s1
    中的令牌
  • 保留原始标记(
    s1T
    )和同一标记的词典排序版本(
    s1S
  • s1S
    中查找
    s2S
    的副本
  • 如果存在匹配项,则在
    s1T
    中有一个子字符串,其中包含
    s2
    的字符
  • 循环浏览
    s1T
    的每个字母,直到在
    s2S
    中找到一个起始字符。比较下一个字符,依此类推,直到出现子字符串或不匹配。如果所有角色都匹配,则您有一个候选命中率

  • 这将减少使用排序字符串时从
    O(n^2)
    O(nlogn)
    的时间。通过字符串中的字符循环是
    O(n)

    这里有一个解决方案,它使用直方图和滑动窗口来找到最佳匹配。它假定只对小写字母感兴趣。如果需要,可以扩展直方图以覆盖不同的字符集。它没有内存分配,在O(n)时间内运行。第一稿正确地将“tod”识别为针的正确输出,我花了31分钟编写,包括调试时间

    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    
    char *findMinSubstring(char *haystack, char *needle, int *bestLength)
    {
        int needleHistogram[26] = {0};
        int haystackHistogram[26] = {0};
    
        // create a histogram from the needle, keeping track of the number of non-zero entries in the histogram
        int count = 0;
        for (int i = 0; needle[i] != '\0'; i++)
            if (islower(needle[i]))
            {
                int c = needle[i] - 'a';
                needleHistogram[c]++;
                if (needleHistogram[c] == 1)
                    count++;
            }
    
        // now look for the best substring using a sliding window
        int start = 0;
        int end = 0;
        int length = (int)strlen(haystack);
        int bestStart = -1;
        int bestEnd = length+1;
        for (;;)
        {
            if (end < length && count != 0)
            {
                // if the window doesn't contain all of the necessary letters, enlarge it by advancing the end
                if (islower(haystack[end]))
                {
                    int c = haystack[end] - 'a';
                    haystackHistogram[c]++;
                    if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c])
                        count--;
                }
                end++;
            }
            else if (start < end && count == 0)
            {
                // if the window contains all of the necessary letters, shrink it by advancing the start
                if (islower(haystack[start]))
                {
                    int c = haystack[start] - 'a';
                    haystackHistogram[c]--;
                    if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c]-1)
                        count++;
                }
                start++;
            }
            else
            {
                // if expanding or shrinking the window isn't an option, then we're done
                break;
            }
    
            // if the window contains all the necessary letters, and is smaller than the previous best, update the best
            if (count == 0 && (end - start) < (bestEnd - bestStart))
            {
                bestStart = start;
                bestEnd = end;
            }
        }
    
        if (bestStart >= 0 && bestEnd <= length)
        {
            // if a matching substring exists, return the length and a pointer to the beginning of the substring
            *bestLength = bestEnd - bestStart;
            return haystack + bestStart;
        }
        else
        {
            // failed, return NULL
            *bestLength = 0;
            return NULL;
        }
    }
    
    int main(void)
    {
        char haystack[] = "it is raining today";
        char *needle[] = { "dot", "dott", "dotti", "it", "today", "i", "ii", "iii", "iiii", "iiiii", "y", "yy", "end", NULL };
        for (int i = 0; needle[i] != NULL; i++)
        {
            int bestLength = 0;
            char *bestString = findMinSubstring(haystack, needle[i], &bestLength);
            printf("%-5s ", needle[i]);
            if (bestString != NULL)
                printf("'%.*s'\n", bestLength, bestString);
            else
                printf(" No matching substring\n");
        }
    }
    

    来自一位经验丰富的程序员:这不是一项琐碎的任务,根据您的级别,可能需要几个小时。事实上,我知道一些“高级程序员”无法无错误地完成它。你花了多长时间?我花了一小时一分钟的时间编辑:但我写的代码效率很低,但我并不在乎测试,只是写我的问题更多的是“C中是否有一个函数可以帮我完成大部分工作?”或者别的什么。工作代码的改进在exchange上非常重要。如果针头是
    “dott”
    ,那么干草堆需要1个还是2个
    t
    ?谢谢。我将分析并学习您的代码。我知道其中的一些缺点,但考虑到目前的情况,我没有时间或关心它们。做得很好。与其使用小写柱状图,不如使用
    大小柱状图[UCHAR\u MAX+1]
    #include <stdlib.h>
    #include <string.h>
    
    const char *minsubstr(const char *str, const char *set, size_t *plen) {
        size_t i, len, set_len = strlen(set), best_len = -1;
        const char *best = NULL;
        if (set_len == 0) {
            best_len = 0;
            best = str;
        } else {
            char *buf = malloc(set_len);
            for (; *str; str++) {
                if (!memchr(set, *str, set_len))
                    continue;
                memcpy(buf, set, len = set_len);
                for (i = 0; i < best_len && str[i]; i++) {
                    char *p = memchr(buf, str[i], len);
                    if (p != NULL) {
                        *p = buf[--len];
                        if (len == 0) {
                            if (best_len > i + 1) {
                                best_len = i + 1;
                                best = str;
                            }
                            break;
                        }
                    }
                }
                if (!str[i]) {
                    // no more matches
                    break;
                }
            }
            free(buf);
        }
        *plen = best_len;
        return best;
    }
    
    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    
    char *findMinSubstring(char *haystack, char *needle, int *bestLength)
    {
        int needleHistogram[26] = {0};
        int haystackHistogram[26] = {0};
    
        // create a histogram from the needle, keeping track of the number of non-zero entries in the histogram
        int count = 0;
        for (int i = 0; needle[i] != '\0'; i++)
            if (islower(needle[i]))
            {
                int c = needle[i] - 'a';
                needleHistogram[c]++;
                if (needleHistogram[c] == 1)
                    count++;
            }
    
        // now look for the best substring using a sliding window
        int start = 0;
        int end = 0;
        int length = (int)strlen(haystack);
        int bestStart = -1;
        int bestEnd = length+1;
        for (;;)
        {
            if (end < length && count != 0)
            {
                // if the window doesn't contain all of the necessary letters, enlarge it by advancing the end
                if (islower(haystack[end]))
                {
                    int c = haystack[end] - 'a';
                    haystackHistogram[c]++;
                    if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c])
                        count--;
                }
                end++;
            }
            else if (start < end && count == 0)
            {
                // if the window contains all of the necessary letters, shrink it by advancing the start
                if (islower(haystack[start]))
                {
                    int c = haystack[start] - 'a';
                    haystackHistogram[c]--;
                    if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c]-1)
                        count++;
                }
                start++;
            }
            else
            {
                // if expanding or shrinking the window isn't an option, then we're done
                break;
            }
    
            // if the window contains all the necessary letters, and is smaller than the previous best, update the best
            if (count == 0 && (end - start) < (bestEnd - bestStart))
            {
                bestStart = start;
                bestEnd = end;
            }
        }
    
        if (bestStart >= 0 && bestEnd <= length)
        {
            // if a matching substring exists, return the length and a pointer to the beginning of the substring
            *bestLength = bestEnd - bestStart;
            return haystack + bestStart;
        }
        else
        {
            // failed, return NULL
            *bestLength = 0;
            return NULL;
        }
    }
    
    int main(void)
    {
        char haystack[] = "it is raining today";
        char *needle[] = { "dot", "dott", "dotti", "it", "today", "i", "ii", "iii", "iiii", "iiiii", "y", "yy", "end", NULL };
        for (int i = 0; needle[i] != NULL; i++)
        {
            int bestLength = 0;
            char *bestString = findMinSubstring(haystack, needle[i], &bestLength);
            printf("%-5s ", needle[i]);
            if (bestString != NULL)
                printf("'%.*s'\n", bestLength, bestString);
            else
                printf(" No matching substring\n");
        }
    }