C# 查找两个列表中的差异_C#_Algorithm_List_Optimization_Comparison

C# 查找两个列表中的差异

c# algorithm list optimization

C# 查找两个列表中的差异,c#,algorithm,list,optimization,comparison,C#,Algorithm,List,Optimization,Comparison,我正在考虑一种在两个列表中找出差异的好方法问题是：两个列表有一些字符串，其中前3个数字/字符（*分隔符）表示唯一键（后跟文本字符串=“key1*key2*key3*text”）以下是字符串示例： AA1*1D*4*The quick brown fox*****CC*3456321234543~ 其中“*AA1*1D*4*”是唯一键列表1：“index1*index2*index3”、“index2*index2*index3”、“index3*index2*index3” 列表2：“

我正在考虑一种在两个列表中找出差异的好方法

问题是：

两个列表有一些字符串，其中前3个数字/字符（*分隔符）表示唯一键（后跟文本字符串=“key1*key2*key3*text”）

以下是字符串示例：

AA1*1D*4*The quick brown fox*****CC*3456321234543~

其中“*AA1*1D*4*”是唯一键

列表1：“index1*index2*index3”、“index2*index2*index3”、“index3*index2*index3”

列表2：“index2*index2*index3”、“index1*index2*index3”、“index3*index2*index3”、“index4*index2*index3”

我需要匹配两个列表中的索引并比较它们

如果1个列表中的所有3个索引与另一个列表中的3个索引匹配，我需要跟踪新列表中的两个字符串项

如果一个列表中有一组索引没有出现在另一个列表中，我需要跟踪其中的一方，并在另一方保留一个空条目。（#上例中为4）

返回列表

这就是我到目前为止所做的，但我在这里有点挣扎：

        List<String> Base = baseListCopy.Except(resultListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values(keep differences in lists)
        List<String> Result = resultListCopy.Except(baseListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values (keep differences in lists)

        List<String[]> blocksComparison = new List<String[]>(); //we container for non-matching blocks; so we could output them later

        //if both reports have same amount of blocks
        if ((Result.Count > 0 || Base.Count > 0) && (Result.Count == Base.Count))
        {
            foreach (String S in Result)
            {
                String[] sArr = S.Split('*');
                foreach (String B in Base)
                {
                    String[] bArr = B.Split('*');

                    if (sArr[0].Equals(bArr[0]) && sArr[1].Equals(bArr[1]) && sArr[2].Equals(bArr[2]) && sArr[3].Equals(bArr[3]))
                    {
                        String[] NA = new String[2]; //keep results
                        NA[0] = B; //[0] for base
                        NA[1] = S; //[1] for result
                        blocksComparison.Add(NA);
                        break;
                    }
                }
            }
        }

List Base=baseListCopy.Except（resultListCopy，StringComparer.invariantCultureInogoreCase.ToList（）//保持唯一值（在列表中保持差异）
List Result=resultListCopy.Exception（baseListCopy，StringComparer.InvariantCultureIgnoreCase.ToList（）//保持唯一值（在列表中保持差异）
列表块比较=新建列表（）//我们为不匹配的块创建一个容器；所以我们可以稍后输出它们
//如果两个报告具有相同数量的块
如果（（Result.Count>0 | | Base.Count>0）&&（Result.Count==Base.Count））
{
foreach（结果中的字符串S）
{
字符串[]sArr=S.Split（'*'）；
foreach（基中的字符串B）
{
字符串[]bArr=B.Split（'*'）；
如果（sArr[0]。等于（bArr[0]）&&sArr[1]。等于（bArr[1]）&&sArr[2]。等于（bArr[2]）&&sArr[3]。等于（bArr[3]））
{
String[]NA=新字符串[2]；//保留结果
NA[0]=B；//[0]表示基
NA[1]=S；//[1]表示结果
区块比较。添加（NA）；
打破
}
}
}
}

你能为这个过程提出一个好的算法吗

谢谢

您可以使用哈希集

为列表1创建哈希集。记住index1*index2*index3与index3*index2*index1不同

现在遍历第二个列表

Create Hashset for List1.

foreach(string in list2)
{
    if(hashset contains string)
       //Add it to the new list.
}

List one=新列表（）；
列表二=新列表（）；
列表三=新列表（）；
HashMap intersect=新的HashMap（）；
for（一：字符串索引）
{
intersect.put（index.next，intersect.get（index.next）+1）；
}
for（二：字符串索引）
{
if（intersect.containsKey（index.next））
{
3.添加（索引.下一步）；
}
}

如果我正确理解了您的问题，您希望能够通过元素的“key”前缀，而不是整个字符串内容来比较元素。如果是这样，实现一个自定义的相等比较器将允许您轻松地利用LINQ集算法

这个节目

class EqCmp : IEqualityComparer<string> {

    public bool Equals(string x, string y) {
        return GetKey(x).SequenceEqual(GetKey(y));
    }

    public int GetHashCode(string obj) {
        // Using Sum could cause OverflowException.
        return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
    }

    static IEnumerable<string> GetKey(string line) {
        // If we just split to 3 strings, the last one could exceed the key, so we split to 4.
        // This is not the most efficient way, but is simple.
        return line.Split(new[] { '*' }, 4).Take(3);
    }

}

class Program {

    static void Main(string[] args) {

        var l1 = new List<string> {
            "index1*index1*index1*some text",
            "index1*index1*index2*some text ** test test test",
            "index1*index2*index1*some text",
            "index1*index2*index2*some text",
            "index2*index1*index1*some text"
        };

        var l2 = new List<string> {
            "index1*index1*index2*some text ** test test test",
            "index2*index1*index1*some text",
            "index2*index1*index2*some text"
        };

        var eq = new EqCmp();

        Console.WriteLine("Elements that are both in l1 and l2:");
        foreach (var line in l1.Intersect(l2, eq))
            Console.WriteLine(line);

        Console.WriteLine("\nElements that are in l1 but not in l2:");
        foreach (var line in l1.Except(l2, eq))
            Console.WriteLine(line);

        // Etc...

    }

}

我想说，对于不同的索引，使用复合字符串键而不是自定义类是问题的根源。如果我是对的，这个问题可以分解为->查找两个列表的交集吗？我在问索引的顺序对不对？

class EqCmp : IEqualityComparer<string> {

    public bool Equals(string x, string y) {
        return GetKey(x).SequenceEqual(GetKey(y));
    }

    public int GetHashCode(string obj) {
        // Using Sum could cause OverflowException.
        return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
    }

    static IEnumerable<string> GetKey(string line) {
        // If we just split to 3 strings, the last one could exceed the key, so we split to 4.
        // This is not the most efficient way, but is simple.
        return line.Split(new[] { '*' }, 4).Take(3);
    }

}

class Program {

    static void Main(string[] args) {

        var l1 = new List<string> {
            "index1*index1*index1*some text",
            "index1*index1*index2*some text ** test test test",
            "index1*index2*index1*some text",
            "index1*index2*index2*some text",
            "index2*index1*index1*some text"
        };

        var l2 = new List<string> {
            "index1*index1*index2*some text ** test test test",
            "index2*index1*index1*some text",
            "index2*index1*index2*some text"
        };

        var eq = new EqCmp();

        Console.WriteLine("Elements that are both in l1 and l2:");
        foreach (var line in l1.Intersect(l2, eq))
            Console.WriteLine(line);

        Console.WriteLine("\nElements that are in l1 but not in l2:");
        foreach (var line in l1.Except(l2, eq))
            Console.WriteLine(line);

        // Etc...

    }

}

Elements that are both in l1 and l2:
index1*index1*index2*some text ** test test test
index2*index1*index1*some text

Elements that are in l1 but not in l2:
index1*index1*index1*some text
index1*index2*index1*some text
index1*index2*index2*some text