C# 查找两个列表中的差异

C# 查找两个列表中的差异,c#,algorithm,list,optimization,comparison,C#,Algorithm,List,Optimization,Comparison,我正在考虑一种在两个列表中找出差异的好方法 问题是: 两个列表有一些字符串,其中前3个数字/字符(*分隔符)表示唯一键(后跟文本字符串=“key1*key2*key3*text”) 以下是字符串示例: AA1*1D*4*The quick brown fox*****CC*3456321234543~ 其中“*AA1*1D*4*”是唯一键 列表1:“index1*index2*index3”、“index2*index2*index3”、“index3*index2*index3” 列表2:“

我正在考虑一种在两个列表中找出差异的好方法

问题是:

两个列表有一些字符串,其中前3个数字/字符(*分隔符)表示唯一键(后跟文本字符串=“key1*key2*key3*text”)

以下是字符串示例:

AA1*1D*4*The quick brown fox*****CC*3456321234543~
其中“*AA1*1D*4*”是唯一键

列表1:“index1*index2*index3”、“index2*index2*index3”、“index3*index2*index3”

列表2:“index2*index2*index3”、“index1*index2*index3”、“index3*index2*index3”、“index4*index2*index3”

我需要匹配两个列表中的索引并比较它们

  • 如果1个列表中的所有3个索引与另一个列表中的3个索引匹配,我需要跟踪新列表中的两个字符串项

  • 如果一个列表中有一组索引没有出现在另一个列表中,我需要跟踪其中的一方,并在另一方保留一个空条目。(#上例中为4)

  • 返回列表

    这就是我到目前为止所做的,但我在这里有点挣扎:

            List<String> Base = baseListCopy.Except(resultListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values(keep differences in lists)
            List<String> Result = resultListCopy.Except(baseListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values (keep differences in lists)
    
            List<String[]> blocksComparison = new List<String[]>(); //we container for non-matching blocks; so we could output them later
    
            //if both reports have same amount of blocks
            if ((Result.Count > 0 || Base.Count > 0) && (Result.Count == Base.Count))
            {
                foreach (String S in Result)
                {
                    String[] sArr = S.Split('*');
                    foreach (String B in Base)
                    {
                        String[] bArr = B.Split('*');
    
                        if (sArr[0].Equals(bArr[0]) && sArr[1].Equals(bArr[1]) && sArr[2].Equals(bArr[2]) && sArr[3].Equals(bArr[3]))
                        {
                            String[] NA = new String[2]; //keep results
                            NA[0] = B; //[0] for base
                            NA[1] = S; //[1] for result
                            blocksComparison.Add(NA);
                            break;
                        }
                    }
                }
            }
    
    List Base=baseListCopy.Except(resultListCopy,StringComparer.invariantCultureInogoreCase.ToList()//保持唯一值(在列表中保持差异)
    List Result=resultListCopy.Exception(baseListCopy,StringComparer.InvariantCultureIgnoreCase.ToList()//保持唯一值(在列表中保持差异)
    列表块比较=新建列表()//我们为不匹配的块创建一个容器;所以我们可以稍后输出它们
    //如果两个报告具有相同数量的块
    如果((Result.Count>0 | | Base.Count>0)&&(Result.Count==Base.Count))
    {
    foreach(结果中的字符串S)
    {
    字符串[]sArr=S.Split('*');
    foreach(基中的字符串B)
    {
    字符串[]bArr=B.Split('*');
    如果(sArr[0]。等于(bArr[0])&&sArr[1]。等于(bArr[1])&&sArr[2]。等于(bArr[2])&&sArr[3]。等于(bArr[3]))
    {
    String[]NA=新字符串[2];//保留结果
    NA[0]=B;//[0]表示基
    NA[1]=S;//[1]表示结果
    区块比较。添加(NA);
    打破
    }
    }
    }
    }
    
    你能为这个过程提出一个好的算法吗

    谢谢

    您可以使用哈希集

    为列表1创建哈希集。记住index1*index2*index3与index3*index2*index1不同

    现在遍历第二个列表

    Create Hashset for List1.
    
    foreach(string in list2)
    {
        if(hashset contains string)
           //Add it to the new list.
    }
    
    List one=新列表();
    列表二=新列表();
    列表三=新列表();
    HashMap intersect=新的HashMap();
    for(一:字符串索引)
    {
    intersect.put(index.next,intersect.get(index.next)+1);
    }
    for(二:字符串索引)
    {
    if(intersect.containsKey(index.next))
    {
    3.添加(索引.下一步);
    }
    }
    
    如果我正确理解了您的问题,您希望能够通过元素的“key”前缀,而不是整个字符串内容来比较元素。如果是这样,实现一个自定义的相等比较器将允许您轻松地利用LINQ集算法

    这个节目

    class EqCmp : IEqualityComparer<string> {
    
        public bool Equals(string x, string y) {
            return GetKey(x).SequenceEqual(GetKey(y));
        }
    
        public int GetHashCode(string obj) {
            // Using Sum could cause OverflowException.
            return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
        }
    
        static IEnumerable<string> GetKey(string line) {
            // If we just split to 3 strings, the last one could exceed the key, so we split to 4.
            // This is not the most efficient way, but is simple.
            return line.Split(new[] { '*' }, 4).Take(3);
        }
    
    }
    
    class Program {
    
        static void Main(string[] args) {
    
            var l1 = new List<string> {
                "index1*index1*index1*some text",
                "index1*index1*index2*some text ** test test test",
                "index1*index2*index1*some text",
                "index1*index2*index2*some text",
                "index2*index1*index1*some text"
            };
    
            var l2 = new List<string> {
                "index1*index1*index2*some text ** test test test",
                "index2*index1*index1*some text",
                "index2*index1*index2*some text"
            };
    
            var eq = new EqCmp();
    
            Console.WriteLine("Elements that are both in l1 and l2:");
            foreach (var line in l1.Intersect(l2, eq))
                Console.WriteLine(line);
    
            Console.WriteLine("\nElements that are in l1 but not in l2:");
            foreach (var line in l1.Except(l2, eq))
                Console.WriteLine(line);
    
            // Etc...
    
        }
    
    }
    

    我想说,对于不同的索引,使用复合字符串键而不是自定义类是问题的根源。如果我是对的,这个问题可以分解为->查找两个列表的交集吗?我在问索引的顺序对不对?
    class EqCmp : IEqualityComparer<string> {
    
        public bool Equals(string x, string y) {
            return GetKey(x).SequenceEqual(GetKey(y));
        }
    
        public int GetHashCode(string obj) {
            // Using Sum could cause OverflowException.
            return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
        }
    
        static IEnumerable<string> GetKey(string line) {
            // If we just split to 3 strings, the last one could exceed the key, so we split to 4.
            // This is not the most efficient way, but is simple.
            return line.Split(new[] { '*' }, 4).Take(3);
        }
    
    }
    
    class Program {
    
        static void Main(string[] args) {
    
            var l1 = new List<string> {
                "index1*index1*index1*some text",
                "index1*index1*index2*some text ** test test test",
                "index1*index2*index1*some text",
                "index1*index2*index2*some text",
                "index2*index1*index1*some text"
            };
    
            var l2 = new List<string> {
                "index1*index1*index2*some text ** test test test",
                "index2*index1*index1*some text",
                "index2*index1*index2*some text"
            };
    
            var eq = new EqCmp();
    
            Console.WriteLine("Elements that are both in l1 and l2:");
            foreach (var line in l1.Intersect(l2, eq))
                Console.WriteLine(line);
    
            Console.WriteLine("\nElements that are in l1 but not in l2:");
            foreach (var line in l1.Except(l2, eq))
                Console.WriteLine(line);
    
            // Etc...
    
        }
    
    }
    
    Elements that are both in l1 and l2:
    index1*index1*index2*some text ** test test test
    index2*index1*index1*some text
    
    Elements that are in l1 but not in l2:
    index1*index1*index1*some text
    index1*index2*index1*some text
    index1*index2*index2*some text