C# 整数表匹配算法_C#_.net_Algorithm_Matching

C# 整数表匹配算法

c# .net algorithm

C# 整数表匹配算法,c#,.net,algorithm,matching,C#,.net,Algorithm,Matching,我们每天大约有50000个数据结构实例（最终可能会变得更大），其中包含以下内容： DateTime AsOfDate; int key; List<int> values; // list of distinct integers 然后将hashsignature哈希为整数，对生成的哈希代码列表进行排序（每天一个列表），遍历两个列表以查找匹配项，然后检查相关键是否不同。（还要检查相关列表，确保没有哈希冲突。）是否有更好的方法？您可能只需哈希列表本身，而不是遍历字符串除此之外，我

我们每天大约有50000个数据结构实例（最终可能会变得更大），其中包含以下内容：

DateTime AsOfDate;
int key;
List<int> values; // list of distinct integers

然后将hash

signature

哈希为整数，对生成的哈希代码列表进行排序（每天一个列表），遍历两个列表以查找匹配项，然后检查相关键是否不同。（还要检查相关列表，确保没有哈希冲突。）

是否有更好的方法？

您可能只需哈希列表本身，而不是遍历字符串

除此之外，我认为你的算法几乎是最优的。假设没有散列冲突，则为O（n log n+m log m），其中n和m是您正在比较的两天中每一天的条目数。（排序是瓶颈。）

如果使用插入哈希的bucket数组（本质上是一个哈希表），则可以在O（n+m）中执行此操作。假设长度取决于条目数，则可以用O（max（n，m））比较两个bucket数组（以获得合理的负载系数）

通过使用HashSet.IntersectWith（）并编写合适的比较函数，应该可以让库为您完成这项工作（看起来您正在使用.NET）

你不能做得比O（n+m）更好，因为每个条目至少需要访问一次

编辑：误读，已修复。

订购是否重要？i、第一天的[1,2]和第二天的[2,1]是否相等？如果是的话，那么散列可能不会那么好用。您可以使用排序的数组/向量来帮助进行比较

还有，这是什么样的钥匙？它是否有一个确定的范围（例如0-63）？您可能能够将它们连接成大整数（可能需要超过64位的精度）和散列，而不是转换成字符串，因为这可能需要一段时间。

在其他答案的基础上，您可以创建一个低成本的散列，只需在每个列表的所有元素之间创建一个XOR即可，从而加快处理速度。您不必对列表进行排序，只需要一个

int

，它比字符串更容易、更快地存储

然后，您只需要将得到的XORed数用作哈希表的键，并在插入它之前检查该键是否存在。如果已经存在一个键，只有这样，才能对相应的列表进行排序并进行比较

如果找到匹配项，您仍然需要比较它们，因为使用简单的异或可能会发生一些冲突。
我认为，与重新排序数组并将其转换为字符串相比，结果将更快，内存占用也更低

如果您有自己的

列表实现

，那么您可以在其中生成XOR键，以便在列表上的每个操作中重新计算它。
这将使检查重复列表的过程更快

代码

下面是实现这一点的第一次尝试

Dictionary<int, List<List<int>>> checkHash = new Dictionary<int, List<List<int>>>();

public bool CheckDuplicate(List<int> theList) {
    bool isIdentical = false;
    int xorkey = 0;
    foreach (int v in theList) xorkey ^= v;

    List<List<int>> existingLists;
    checkHash.TryGetValue(xorkey, out existingLists);
    if (existingLists != null) {
        // Already in the dictionary. Check each stored list
        foreach (List<int> li in existingLists) {
            isIdentical = (theList.Count == li.Count);
            if (isIdentical) {
                // Check all elements
                foreach (int v in theList) {
                    if (!li.Contains(v)) {
                        isIdentical = false;
                        break;
                    }
                }
            }
            if (isIdentical) break;
        }
    }
    if (existingLists == null || !isIdentical) {
        // never seen this before, add it
        List<List<int>> newList = new List<List<int>>();
        newList.Add(theList);
        checkHash.Add(xorkey, newList);
    }
    return isIdentical;
}

Dictionary checkHash=new Dictionary（）；
公共布尔检查重复（列表中的列表）{
bool-isIdentical=false；
int-xorkey=0；
foreach（列表中的int v）xorkey^=v；
列出现有列表；
checkHash.TryGetValue（xorkey，out existinglist）；
if（existinglist！=null）{
//已在字典中。请检查每个存储的列表
foreach（在现有列表中列出li）{
isIdentical=（theList.Count==li.Count）；
if（电子证书）{
//检查所有元素
foreach（列表中的int v）{
如果（！li.包含（v））{
isIdentical=假；
打破
}
}
}
如果（不合格）断裂；
}
}
如果（ExistingList==null | |！isIdentical）{
//以前从未见过这个，添加它
List newList=新列表（）；
添加（列表）；
添加（xorkey，newList）；
}
返回身份；
}

不是最优雅的，也不是最容易一眼就能看懂的，它相当“哈奇”，我甚至不确定它是否比Guffa更优雅的版本更好。
但它所做的是通过在字典中存储

List

的列表来处理XOR键中的冲突

如果发现重复的键，我们将循环遍历每个先前存储的列表，直到发现不匹配为止

该代码的优点在于，在大多数情况下，它应该尽可能快，并且在发生冲突时仍比编译字符串快。

为列表实现IEqualityComparer，然后您可以将列表用作字典中的键

如果对列表进行排序，则可以简单如下：

IntListEqualityComparer : IEqualityComparer<List<int>> {

   public int GetHashCode(List<int> list) {
      int code = 0;
      foreach (int value in list) code ^=value;
      return code;
   }

   public bool Equals(List<int> list1, List<int> list2) {
      if (list1.Count != list2.Coount) return false;
      for (int i = 0; i < list1.Count; i++) {
        if (list1[i] != list2[i]) return false;
      }
      return true;
   }

}

IntListQualityComparer:IEqualityComparer{
public int GetHashCode（列表）{
int代码=0；
foreach（列表中的int值）代码^=值；
返回码；
}
公共布尔等于（列表1、列表2）{
如果（list1.Count！=list2.Coount）返回false；
for（int i=0；i


现在，您可以创建一个使用IEqualityComparer的字典：
Dictionary<List<int>, YourClass> day1 = new Dictionary<List<int>, YourClass>(new IntListEqualityComparer());

Dictionary day1=新字典（new IntListEqualityComparer（））；

将第一天的所有项添加到字典中，然后循环第二天的项并检查字典中是否存在该键。由于IEQualityCompraer同时处理哈希代码和比较，因此不会得到任何错误匹配
您可能需要测试一些计算哈希代码的不同方法。示例中的一个可以工作，但可能无法为您的特定数据提供最佳效率。字典工作对散列代码的唯一要求是，同一个列表总是得到相同的散列代码，因此您几乎可以做任何您想计算它的事情。
Dictionary<List<int>, YourClass> day1 = new Dictionary<List<int>, YourClass>(new IntListEqualityComparer());