在C中删除数组中的重复项_C_Arrays_Duplicates_Duplicate Removal

在C中删除数组中的重复项

c arrays

在C中删除数组中的重复项,c,arrays,duplicates,duplicate-removal,C,Arrays,Duplicates,Duplicate Removal,这个问题有点复杂。这里的问题是消除重复项，并将数组中唯一的元素以其原始序列保存到另一个数组中例如：如果输入b a c a d t 结果应该是：b a c d t处于输入的确切状态因此，对数组进行排序时，检查无法工作，因为我丢失了原始序列。有人建议我使用索引数组，但我不知道怎么做。那么你的建议是什么呢对于那些愿意回答这个问题的人，我想补充一些具体的信息 char** finduni(char *words[100],int limit) { // //Methods here // }

这个问题有点复杂。这里的问题是消除重复项，并将数组中唯一的元素以其原始序列保存到另一个数组中

例如：

如果输入b a c a d t

结果应该是：b a c d t处于输入的确切状态

因此，对数组进行排序时，检查无法工作，因为我丢失了原始序列。有人建议我使用索引数组，但我不知道怎么做。那么你的建议是什么呢

对于那些愿意回答这个问题的人，我想补充一些具体的信息

char** finduni(char *words[100],int limit)
{
//
//Methods here
//
}

这是我的职责。应删除其副本并将其存储在不同数组中的数组是words[100]。所以，这一过程将在这一点上完成。我首先考虑将单词的所有元素放入另一个数组中，并对该数组进行排序，但经过一些测试后，这不起作用。只是提醒解算器：）

遍历数组中的项-

O（n）

操作

对于每个项目，将其添加到另一个排序数组中

在将其添加到排序数组之前，请检查条目是否已存在-

O（log n）

operation

最后，

O（nlogn）

操作

我认为在C中可以创建第二个数组。然后，仅当该元素不在发送数组中时，才从原始数组复制该元素。这也保留了元素的顺序

如果逐个读取元素，则可以在插入原始数组之前丢弃该元素，这可能会加快过程。

正如Thomas在一篇评论中所建议的，如果保证数组中的每个元素都来自一组有限的值（例如

字符），则可以在O（n）
时间内实现这一点
保留256个bool
（或int
的数组，如果编译器不支持bool
），或者数组中可能有多少不同的离散值。将所有值初始化为false
逐个扫描输入阵列
对于每个元素，如果bool
数组中的对应值为false
，则将其添加到输出数组中，并将bool
数组值设置为true
。否则，什么也不做
好的，这里有一个用于char
类型的版本。注意，它不可缩放
#include "stdio.h"
#include "string.h"

void removeDuplicates(unsigned char *string)
{
   unsigned char allCharacters [256] = { 0 };
   int lookAt;
   int writeTo = 0;
   for(lookAt = 0; lookAt < strlen(string); lookAt++)
   {
      if(allCharacters[ string[lookAt] ] == 0)
      {
         allCharacters[ string[lookAt] ] = 1;  // mark it seen
         string[writeTo++] = string[lookAt];     // copy it
      }
   }
   string[writeTo] = '\0';
}

int main()
{
   char word[] = "abbbcdefbbbghasdddaiouasdf";
   removeDuplicates(word);
   printf("Word is now [%s]\n", word);
   return 0;
}

这是你想要的吗？如果字母之间有空格，则可以修改该方法，但如果使用int
、float
、double
或char*
作为类型，则该方法根本不会缩放
编辑
我发布了，然后看到了你的澄清，在那里是一个char*
数组。我将更新该方法

我希望这不是太多的代码。我对其进行了调整，并基本上添加了索引内存。该算法是O（n logn），因为下面的3个步骤是相加的，这是其中2个步骤的最坏情况复杂度
对字符串数组进行排序，但每次交换也应反映在索引数组中。在此阶段之后，originalIndices
的第i个元素保存已排序数组的第i个元素的原始索引
通过将排序数组中的重复元素设置为NULL
，并将索引值设置为elements
，这是可以达到的最高值，从而删除重复元素
对原始索引数组进行排序，并确保每个交换都反映在字符串数组中。这将返回原始字符串数组，除了重复的字符串在末尾，它们都是NULL
为了更好地度量，我返回新的元素计数
代码：
#包括“stdio.h”
#包括“string.h”
#包括“stdlib.h”
void sortArrayAndSetCriteria（字符**arr，int元素，int*原始属性）
{
#定义最大\u级别1000
char*piv；
int beg[MAX_LEVELS]，end[MAX_LEVELS]，i=0，L，R；
int-idx，cidx；
对于（idx=0；idx=0）
{
L=乞求[i]；
R=结束[i]-1；
如果（L=0&&L而（strcmp（arr[L]，piv）您知道如何对char类型执行此操作，对吗？
您可以对字符串执行相同的操作，但不必使用布尔数组（从技术上讲，布尔数组是“set”对象的实现），您必须使用已经遇到的字符串线性数组来模拟“set”（或布尔数组）。也就是说，您已经看到了一个字符串数组，对于每个新字符串，您都要检查它是否在“seed”数组中字符串，如果是，则忽略它（不是唯一的），如果它不在数组中，则将其添加到已看到的字符串数组和输出中。如果有少量不同的字符串（低于1000），则可以忽略性能优化，只需将每个新字符串与以前看到的所有字符串进行比较
但是，对于大量字符串（几千个），您需要对其进行一些优化：
1） 每次向已看到的字符串数组中添加新字符串时，请使用插入排序算法对数组进行排序。不要使用快速排序，因为当数据几乎已排序时，插入排序往往会更快
2） 检查字符串是否在数组中时，请使用二进制搜索
如果不同字符串的数量合理（即，您没有数十亿个唯一字符串），这种方法应该足够快。
这是一个由字符组成的数组吗？如果是这样的话，只需保留一个由256个布尔值组成的数组，指示您以前见过哪些字符。但它必须按顺序排列……我有一些问题-输入是一次输入1个，还是一次输入全部？这是一个c数组吗har，或者其他更高界限的类型？@thomas确实如此
Word is now [abcdefghsiou]

#include "stdio.h"
#include "string.h"
#include "stdlib.h"

void sortArrayAndSetCriteria(char **arr, int elements, int *originalIndices)
{
   #define  MAX_LEVELS  1000
   char *piv;
   int  beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
   int idx, cidx;
   for(idx = 0; idx < elements; idx++)
      originalIndices[idx] = idx;
   beg[0] = 0;
   end[0] = elements;
   while (i>=0)
   {
      L = beg[i];
      R = end[i] - 1;
      if (L<R)
      {
         piv = arr[L];
         cidx = originalIndices[L];
         if (i==MAX_LEVELS-1)
            return;
         while (L < R)
         {
            while (strcmp(arr[R], piv) >= 0 && L < R) R--;
            if (L < R)
            {
               arr[L] = arr[R];
               originalIndices[L++] = originalIndices[R];
            }
            while (strcmp(arr[L], piv) <= 0 && L < R) L++;
            if (L < R)
            {
               arr[R] = arr[L];
               originalIndices[R--] = originalIndices[L];
            }
         }
         arr[L] = piv;
         originalIndices[L] = cidx;
         beg[i + 1] = L + 1;
         end[i + 1] = end[i];
         end[i++] = L;
      }
      else
      {
         i--;
      }
   }
}

int removeDuplicatesFromBoth(char **arr, int elements, int *originalIndices)
{
   // now remove duplicates
   int i = 1, newLimit = 1;
   char *curr = arr[0];
   while (i < elements)
   {
      if(strcmp(curr, arr[i]) == 0)
      {
         arr[i] = NULL;   // free this if it was malloc'd
         originalIndices[i] = elements;  // place it at the end
      }
      else
      {
         curr = arr[i];
         newLimit++;
      }
      i++;
   }
   return newLimit;
}

void sortArrayBasedOnCriteria(char **arr, int elements, int *originalIndices)
{
   #define  MAX_LEVELS  1000
   int piv;
   int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
   int idx;
   char *cidx;
   beg[0] = 0;
   end[0] = elements;
   while (i>=0)
   {
      L = beg[i];
      R = end[i] - 1;
      if (L<R)
      {
         piv = originalIndices[L];
         cidx = arr[L];
         if (i==MAX_LEVELS-1)
            return;
         while (L < R)
         {
            while (originalIndices[R] >= piv && L < R) R--;
            if (L < R)
            {
               arr[L] = arr[R];
               originalIndices[L++] = originalIndices[R];
            }
            while (originalIndices[L] <= piv && L < R) L++;
            if (L < R)
            {
               arr[R] = arr[L];
               originalIndices[R--] = originalIndices[L];
            }
         }
         arr[L] = cidx;
         originalIndices[L] = piv;
         beg[i + 1] = L + 1;
         end[i + 1] = end[i];
         end[i++] = L;
      }
      else
      {
         i--;
      }
   }
}

int removeDuplicateStrings(char *words[], int limit)
{
   int *indices = (int *)malloc(limit * sizeof(int));
   int newLimit;
   sortArrayAndSetCriteria(words, limit, indices);
   newLimit = removeDuplicatesFromBoth(words, limit, indices);
   sortArrayBasedOnCriteria(words, limit, indices);
   free(indices);
   return newLimit;
}

int main()
{
   char *words[] = { "abc", "def", "bad", "hello", "captain", "def", "abc", "goodbye" };
   int newLimit = removeDuplicateStrings(words, 8);
   int i = 0;
   for(i = 0; i < newLimit; i++) printf(" Word @ %d = %s\n", i, words[i]);
   return 0;
}