C++中更快的子串处理_C_String

C++中更快的子串处理

c string

C++中更快的子串处理,c,string,C,String,我有一个程序，它对所有可能的特定长度的子字符串进行处理。我正在努力使程序尽可能快。我只是想知道可以对下面的程序做些什么来使它更快 char str[] = "abcdcddcdcdcdcd....................." // large string int n = strlen(str), m = 20; for(int i=0; i<n; i++){ char *substr = (char*) malloc(sizeof(char)*m); strncpy(su

我有一个程序，它对所有可能的特定长度的子字符串进行处理。我正在努力使程序尽可能快。我只是想知道可以对下面的程序做些什么来使它更快

char str[] = "abcdcddcdcdcdcd....................." // large string
int n = strlen(str), m = 20;
for(int i=0; i<n; i++){
  char *substr = (char*) malloc(sizeof(char)*m);
  strncpy(substr, str+i, m);
  // do some processing
  int h = hd(substr, X) // X is another string of same length
  free(substr); 
}

unsigned int hd(const std::string& s1, const std::string& s2)
{

    return std::inner_product(
        s1.begin(), s1.end(), s2.begin(),
        0, std::plus<unsigned int>(),
        std::not2(std::equal_to<std::string::value_type>())
    );
}

也许是这样。它通过传递当前子字符串的指针和要匹配的字符串长度来避免多个字符串处理

#include <stdio.h>
#include <string.h>

int hd(char *str, char *cmp, int len)
// find hamming distance between substring *str and *cmp of length len
{
    int ind, hamming = 0;
    for(ind=0; ind<len; ind++) {
        if(str[ind] != cmp[ind]) {
            hamming++;
        }
    }
    return hamming;
}

int main(void)
// find hamming distance
{
    char str[] = "abcdcddcdcdcdcd";
    char cmp[] = "abc";
    int lens = strlen(str);
    int lenc = strlen(cmp);
    int ind, max;
    max = lens - lenc;
    // analyse each possible substring
    for(ind=0; ind<=max; ind++) {
        printf("%d\n", hd(str + ind, cmp, lenc));
    }
}

通过将malloc和free移动到循环之外，程序的速度会更快

char str[] = "abcdcddcdcdcdcd....................." // large string
int n = strlen(str), m = 20;
char *substr = (char*) malloc(sizeof(char)*m);
for(int i=0; i<n; i++){
  //char *substr = (char*) malloc(sizeof(char)*m);
  strncpy(substr, str+i, m);
  // do some processing
  int h = hd(substr, X) // X is another string of same length
  //free(substr); 
}
free(substr);

unsigned int hd(const std::string& s1, const std::string& s2)
{

    return std::inner_product(
        s1.begin(), s1.end(), s2.begin(),
        0, std::plus<unsigned int>(),
        std::not2(std::equal_to<std::string::value_type>())
    );
}

这不是C++代码。实际上看起来更像普通的c。您可以避免malloc调用，并在循环之外创建缓冲区。是的，我使用char*而不是string。我认为string的substr函数创建了一个对象。因此，对于大字符串来说，它的速度会慢一些。通过就地分析子字符串而不是复制它？@Jarod42是的，我想去掉malloc。我还试着把它放在循环之前，然后在循环之后释放。在我最初的实现中，我使用了我自己的汉明距离函数，就像你写的那样。但我在函数中传递了两个字符*。但是使用静态内联int hdunsigned x，unsigned y使它更快。我尝试了你的方法，也尝试了我的程序在for循环之前使用malloc，但是使用hd函数后的程序速度更快。你也可以内联这个hd函数。我怎么做？我相信这可以让它更快。你知道怎么做，因为你的问题已经做到了。我只是在演示如何在不创建和销毁子字符串的情况下完成它。明白了，谢谢！！看来我读你的评论太快了：