C查找包含s2中所有字母的字符串s1的最短子字符串
用户输入字符串C查找包含s2中所有字母的字符串s1的最短子字符串,c,string,substring,C,String,Substring,用户输入字符串s1和s2。查找字符串s1中包含所有s2字母的最小子字符串。(如果有多个大小相同的,请查找出现的第一个) 输入示例: it is raining today dot 输出:tod 注意:我写了一个有效的代码,但是我花了太多的时间去理解和编写,而且因为我在测试中会有这样的例子,这并不好有更简单的方法吗? 我是如何做到的:我编写了两个函数,一个用于返回给定字符串的从索引I到索引j的子字符串,另一个用于检查子字符串是否包含所有字母。然后我使用嵌套for循环找到包含所有字母的最短子字符串
s1
和s2
。查找字符串s1
中包含所有s2
字母的最小子字符串。(如果有多个大小相同的,请查找出现的第一个)
输入示例:
it is raining today
dot
输出:tod
注意:我写了一个有效的代码,但是我花了太多的时间去理解和编写,而且因为我在测试中会有这样的例子,这并不好有更简单的方法吗?
我是如何做到的:我编写了两个函数,一个用于返回给定字符串的从索引I
到索引j
的子字符串,另一个用于检查子字符串是否包含所有字母。然后我使用嵌套for循环找到包含所有字母的最短子字符串
我的代码(工作):
#包括
#包括
#包括
#包括
常数int LEN=100;
char*子字符串(int beg、int end、int n、char str[n])
{
int resultLen=abs(end-beg),i;
char*result=malloc(sizeof(char)*resultLen);
对于(i=0;i
您的代码有一些缺点:
- 您不应该使用
get()
- 您分配了大量内存,但从未释放它
- 如果
包含一个或多个s2
字符,则它不会按预期工作+
#include <stdio.h>
#include <string.h>
const char *minsubstr(const char *str, const char *set, size_t *plen) {
size_t count[256] = { 0 }, cc[256];
size_t i, n, set_len, best_len = -1;
const char *best = NULL;
for (i = 0; set[i]; i++) {
unsigned char c = set[i];
count[c]++;
}
set_len = i;
if (set_len == 0) {
best_len = 0;
best = str;
} else {
for (; *str; str++) {
if (count[(unsigned char)*str] == 0)
continue;
memcpy(cc, count, sizeof cc);
for (i = n = 0; i < best_len && str[i]; i++) {
unsigned char c = str[i];
if (cc[c]) {
cc[c]--;
if (++n == set_len) {
if (best_len > i + 1) {
best_len = i + 1;
best = str;
}
break;
}
}
}
if (!str[i]) {
// no more matches
break;
}
}
}
*plen = best_len;
return best;
}
int main() {
char s1[100], s2[100];
const char *p;
size_t len;
if (fgets(s1, sizeof s1, stdin) && fgets(s2, sizeof s2, stdin)) {
s1[strcspn(s1, "\n")] = '\0'; // strip the trailing newline
s2[strcspn(s2, "\n")] = '\0'; // strip the trailing newline if any
p = minsubstr(s1, s2, &len);
if (p) {
printf("%.*s\n", (int)len, p);
} else {
printf("no match\n");
}
}
return 0;
}
也许可以通过使用排序字符串来避免嵌套的
for
循环和这些循环所花费的O(n^2)
时间:
s2
(将此排序字符串称为:s2S
)s1
s1
中的令牌s1T
)和同一标记的词典排序版本(s1S
)s1S
中查找s2S
的副本s1T
中有一个子字符串,其中包含s2
的字符s1T
的每个字母,直到在s2S
中找到一个起始字符。比较下一个字符,依此类推,直到出现子字符串或不匹配。如果所有角色都匹配,则您有一个候选命中率这将减少使用排序字符串时从
O(n^2)
到O(nlogn)
的时间。通过字符串中的字符循环是O(n)
这里有一个解决方案,它使用直方图和滑动窗口来找到最佳匹配。它假定只对小写字母感兴趣。如果需要,可以扩展直方图以覆盖不同的字符集。它没有内存分配,在O(n)时间内运行。第一稿正确地将“tod”识别为针的正确输出,我花了31分钟编写,包括调试时间
#include <stdio.h>
#include <ctype.h>
#include <string.h>
char *findMinSubstring(char *haystack, char *needle, int *bestLength)
{
int needleHistogram[26] = {0};
int haystackHistogram[26] = {0};
// create a histogram from the needle, keeping track of the number of non-zero entries in the histogram
int count = 0;
for (int i = 0; needle[i] != '\0'; i++)
if (islower(needle[i]))
{
int c = needle[i] - 'a';
needleHistogram[c]++;
if (needleHistogram[c] == 1)
count++;
}
// now look for the best substring using a sliding window
int start = 0;
int end = 0;
int length = (int)strlen(haystack);
int bestStart = -1;
int bestEnd = length+1;
for (;;)
{
if (end < length && count != 0)
{
// if the window doesn't contain all of the necessary letters, enlarge it by advancing the end
if (islower(haystack[end]))
{
int c = haystack[end] - 'a';
haystackHistogram[c]++;
if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c])
count--;
}
end++;
}
else if (start < end && count == 0)
{
// if the window contains all of the necessary letters, shrink it by advancing the start
if (islower(haystack[start]))
{
int c = haystack[start] - 'a';
haystackHistogram[c]--;
if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c]-1)
count++;
}
start++;
}
else
{
// if expanding or shrinking the window isn't an option, then we're done
break;
}
// if the window contains all the necessary letters, and is smaller than the previous best, update the best
if (count == 0 && (end - start) < (bestEnd - bestStart))
{
bestStart = start;
bestEnd = end;
}
}
if (bestStart >= 0 && bestEnd <= length)
{
// if a matching substring exists, return the length and a pointer to the beginning of the substring
*bestLength = bestEnd - bestStart;
return haystack + bestStart;
}
else
{
// failed, return NULL
*bestLength = 0;
return NULL;
}
}
int main(void)
{
char haystack[] = "it is raining today";
char *needle[] = { "dot", "dott", "dotti", "it", "today", "i", "ii", "iii", "iiii", "iiiii", "y", "yy", "end", NULL };
for (int i = 0; needle[i] != NULL; i++)
{
int bestLength = 0;
char *bestString = findMinSubstring(haystack, needle[i], &bestLength);
printf("%-5s ", needle[i]);
if (bestString != NULL)
printf("'%.*s'\n", bestLength, bestString);
else
printf(" No matching substring\n");
}
}
来自一位经验丰富的程序员:这不是一项琐碎的任务,根据您的级别,可能需要几个小时。事实上,我知道一些“高级程序员”无法无错误地完成它。你花了多长时间?我花了一小时一分钟的时间编辑:但我写的代码效率很低,但我并不在乎测试,只是写我的问题更多的是“C中是否有一个函数可以帮我完成大部分工作?”或者别的什么。工作代码的改进在exchange上非常重要。如果针头是
“dott”
,那么干草堆需要1个还是2个t
?谢谢。我将分析并学习您的代码。我知道其中的一些缺点,但考虑到目前的情况,我没有时间或关心它们。做得很好。与其使用小写柱状图,不如使用大小柱状图[UCHAR\u MAX+1]
#include <stdlib.h>
#include <string.h>
const char *minsubstr(const char *str, const char *set, size_t *plen) {
size_t i, len, set_len = strlen(set), best_len = -1;
const char *best = NULL;
if (set_len == 0) {
best_len = 0;
best = str;
} else {
char *buf = malloc(set_len);
for (; *str; str++) {
if (!memchr(set, *str, set_len))
continue;
memcpy(buf, set, len = set_len);
for (i = 0; i < best_len && str[i]; i++) {
char *p = memchr(buf, str[i], len);
if (p != NULL) {
*p = buf[--len];
if (len == 0) {
if (best_len > i + 1) {
best_len = i + 1;
best = str;
}
break;
}
}
}
if (!str[i]) {
// no more matches
break;
}
}
free(buf);
}
*plen = best_len;
return best;
}
#include <stdio.h>
#include <ctype.h>
#include <string.h>
char *findMinSubstring(char *haystack, char *needle, int *bestLength)
{
int needleHistogram[26] = {0};
int haystackHistogram[26] = {0};
// create a histogram from the needle, keeping track of the number of non-zero entries in the histogram
int count = 0;
for (int i = 0; needle[i] != '\0'; i++)
if (islower(needle[i]))
{
int c = needle[i] - 'a';
needleHistogram[c]++;
if (needleHistogram[c] == 1)
count++;
}
// now look for the best substring using a sliding window
int start = 0;
int end = 0;
int length = (int)strlen(haystack);
int bestStart = -1;
int bestEnd = length+1;
for (;;)
{
if (end < length && count != 0)
{
// if the window doesn't contain all of the necessary letters, enlarge it by advancing the end
if (islower(haystack[end]))
{
int c = haystack[end] - 'a';
haystackHistogram[c]++;
if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c])
count--;
}
end++;
}
else if (start < end && count == 0)
{
// if the window contains all of the necessary letters, shrink it by advancing the start
if (islower(haystack[start]))
{
int c = haystack[start] - 'a';
haystackHistogram[c]--;
if (needleHistogram[c] > 0 && haystackHistogram[c] == needleHistogram[c]-1)
count++;
}
start++;
}
else
{
// if expanding or shrinking the window isn't an option, then we're done
break;
}
// if the window contains all the necessary letters, and is smaller than the previous best, update the best
if (count == 0 && (end - start) < (bestEnd - bestStart))
{
bestStart = start;
bestEnd = end;
}
}
if (bestStart >= 0 && bestEnd <= length)
{
// if a matching substring exists, return the length and a pointer to the beginning of the substring
*bestLength = bestEnd - bestStart;
return haystack + bestStart;
}
else
{
// failed, return NULL
*bestLength = 0;
return NULL;
}
}
int main(void)
{
char haystack[] = "it is raining today";
char *needle[] = { "dot", "dott", "dotti", "it", "today", "i", "ii", "iii", "iiii", "iiiii", "y", "yy", "end", NULL };
for (int i = 0; needle[i] != NULL; i++)
{
int bestLength = 0;
char *bestString = findMinSubstring(haystack, needle[i], &bestLength);
printf("%-5s ", needle[i]);
if (bestString != NULL)
printf("'%.*s'\n", bestLength, bestString);
else
printf(" No matching substring\n");
}
}