在C中实现（包含）过滤器的最佳方法_C_Algorithm

在C中实现（包含）过滤器的最佳方法

c algorithm

在C中实现（包含）过滤器的最佳方法,c,algorithm,C,Algorithm,场景如下：我有一个需要解析的事件的大列表，只传递我需要的事件。实现这一目标的最佳方式是什么？到目前为止，我的代码如下所示： #include <stdio.h> #include <string.h> int main (int argc, char **argv) { int i, j; char events[4][15]; /* for the sake of the test only*/ char *ptr; /* prob

场景如下：我有一个需要解析的事件的大列表，只传递我需要的事件。实现这一目标的最佳方式是什么？到目前为止，我的代码如下所示：

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv) {

    int i, j;
    char events[4][15]; /* for the sake of the test only*/
    char *ptr;

    /* probably loaded from a config file */
    const char *filter[] = {"first_event", "third_event"};

    /* just preparing data so far */    
    strcpy(events[0], "first_event");
    strcpy(events[1], "second_event");
    strcpy(events[2], "third_event");
    strcpy(events[3], "foo");

    /* my approach is to have two iterations */
    for (j = 0; j < 2; j++) {
        for (i = 0; i < 3; i++) {
            if (strcmp(filter[j], events[i]) == 0) {
                printf("pass -> %s\n", events[i]);
                continue;
            }
        }
    }
}

#包括
#包括
int main（int argc，字符**argv）{
int i，j；
字符事件[4][15]；/*仅用于测试*/
char*ptr；
/*可能是从配置文件加载的*/
常量字符*过滤器[]={“第一个事件”、“第三个事件”}；
/*目前正在准备数据*/
strcpy（事件[0]，“首次事件”）；
strcpy（事件[1]，“第二次事件”）；
strcpy（事件[2]，“第三次事件”）；
strcpy（事件[3]，“foo”）；
/*我的方法是进行两次迭代*/
对于（j=0；j<2；j++）{
对于（i=0；i<3；i++）{
if（strcmp（过滤器[j]，事件[i]）==0）{
printf（“通过->%s\n”，事件[i]）；
继续；
}
}
}
}

您不能在C中使用stl

map

，否则这将是实现

m*log（n）

总体复杂性的最简单方法，其中m=事件数，n=所有过滤器的最大长度。现在实现

m*log（n）

的下一个最简单的方法是使用。您将发现一个现成的trie树实现。该实现的可能用途如下（我没有尝试编译它）：

Trie*ttree=Trie_new（）；
对于（i=0；i您不能在C中使用stlmap
，否则这将是实现m*log（n）
总体复杂性的最简单方法，其中m=事件数，n=所有过滤器的最大长度。现在实现m*log（n）的下一个最简单方法是使用。您将找到一个随时可用的trie树实现。
该实现的可能用途如下（我没有尝试编译它）：
Trie*ttree=Trie_new（）；
对于（i=0；i，“最佳”的定义不是很好。如果你有大量的项目，你会看到通过使用标准C的qsort和bsearch可以显著提高性能，只需对代码进行最少的复杂化或更改。这很简洁，但我不知道它是否符合你对“最佳”的定义。对于“最佳”的定义如果此代码被排除在外，请参见以下回答：
#include <stddef.h>
#include <stdio.h>
#include <string.h>

int main (int argc, char **argv) {

    int i, j;
    char events[4][15]; /* for the sake of the test only*/
    char *ptr;
    Size_t item_count = 0;

    /* probably loaded from a config file */
    const char *filter[] = {"first_event", "third_event"};

    /* just preparing data so far */    
    strcpy(events[item_count++], "first_event");
    strcpy(events[item_count++], "second_event");
    strcpy(events[item_count++], "third_event");
    strcpy(events[item_count++], "foo");

    qsort(events, item_count, sizeof *events, strcmp);

    for (j = 0; j < 2; j++) {
        char *pass = bsearch(filter[j], events, item_count, sizeof *events, strcmp);
        If (pass != NULL) {
            printf("pass -> %s\n", pass);
            continue;
        }
    }
}

#包括
#包括
#包括
int main（int argc，字符**argv）{
int i，j；
字符事件[4][15]；/*仅用于测试*/
char*ptr；
大小\u t项目\u计数=0；
/*可能是从配置文件加载的*/
常量字符*过滤器[]={“第一个事件”、“第三个事件”}；
/*目前正在准备数据*/
strcpy（事件[项目计数++]，“第一个事件”）；
strcpy（事件[项目计数++]，“第二个事件”）；
strcpy（事件[项目计数++]，“第三个事件”）；
strcpy（事件[项目计数++]，“foo”）；
qsort（事件、项目计数、大小*事件、strcmp）；
对于（j=0；j<2；j++）{
char*pass=b搜索（过滤器[j]，事件，项目计数，sizeof*事件，strcmp）；
如果（通过！=NULL）{
printf（“通过->%s\n”，通过）；
继续；
}
}
}
“最佳”的定义不是很好。如果你有大量的项目，你会看到通过使用标准C的qsort和bsearch，性能显著提高，只需对代码进行最少的复杂操作或更改。这很简洁，但我不知道它是否符合你对“最佳”的定义。关于“最佳”的定义如果此代码被排除在外，请参见以下回答：
#include <stddef.h>
#include <stdio.h>
#include <string.h>

int main (int argc, char **argv) {

    int i, j;
    char events[4][15]; /* for the sake of the test only*/
    char *ptr;
    Size_t item_count = 0;

    /* probably loaded from a config file */
    const char *filter[] = {"first_event", "third_event"};

    /* just preparing data so far */    
    strcpy(events[item_count++], "first_event");
    strcpy(events[item_count++], "second_event");
    strcpy(events[item_count++], "third_event");
    strcpy(events[item_count++], "foo");

    qsort(events, item_count, sizeof *events, strcmp);

    for (j = 0; j < 2; j++) {
        char *pass = bsearch(filter[j], events, item_count, sizeof *events, strcmp);
        If (pass != NULL) {
            printf("pass -> %s\n", pass);
            continue;
        }
    }
}

#包括
#包括
#包括
int main（int argc，字符**argv）{
int i，j；
字符事件[4][15]；/*仅用于测试*/
char*ptr；
大小\u t项目\u计数=0；
/*可能是从配置文件加载的*/
常量字符*过滤器[]={“第一个事件”、“第三个事件”}；
/*目前正在准备数据*/
strcpy（事件[项目计数++]，“第一个事件”）；
strcpy（事件[项目计数++]，“第二个事件”）；
strcpy（事件[项目计数++]，“第三个事件”）；
strcpy（事件[项目计数++]，“foo”）；
qsort（事件、项目计数、大小*事件、strcmp）；
对于（j=0；j<2；j++）{
char*pass=b搜索（过滤器[j]，事件，项目计数，sizeof*事件，strcmp）；
如果（通过！=NULL）{
printf（“通过->%s\n”，通过）；
继续；
}
}
}
这取决于n
（项目数量）和m
（可接受的值数量）的大小，但这就是O（n*m）
。这是“非常糟糕的”。如果m
是固定的/有界的，那么它就是m*O（n）
，这可能是可以接受的。其他方法可能是使用O（lg m）
（即二进制搜索）或O（1）
（即散列），它们分别将边界缩小到O（n log m）
和O（n）
。当然，对于n*m
的小值，通常并不重要。如果您使用的是Linux？@pst，为什么不使用“grep”或“awk”呢？您将散列表的哪个方面描述为O（1）？用于表示的最坏情况常量内存？用于顺序遍历的最佳情况常量时间？如果您要将分析纳入本讨论，至少要具体说明您正在分析的内容…@pst在插入或获取哈希表之前，必须计算哈希。哈希的计算基于ke的长度y、 就常数时间的最坏情况分析而言，这似乎是一个O（1）操作吗？如果是，则PATRICIA trie可能也可以插入并获取O（1）最坏情况常数时间。@ModifiableValue“以查找该项”，时间复杂度。哈希不需要是密钥的长度，也不需要维护哈希/相等关系，并且在大多数情况下，哈希/比较函数（对于单个项）本身被视为常数时间（即有一个有界长度）这是我在这里所做的。如果你想进一步分析/纠正我的评论，请考虑一个答案，因为它会更有用/直观。这取决于<代码> n>代码>（项目数量）和<代码>的大小。