C#-将两个列表与自定义元素进行比较_C#_List_For Loop_Iterator

C#-将两个列表与自定义元素进行比较

c# list for-loop

C#-将两个列表与自定义元素进行比较,c#,list,for-loop,iterator,C#,List,For Loop,Iterator,我有两张清单。一个包含搜索元素，一个包含数据。我需要循环list2中包含list1中任何字符串（“cat”或“dog”）的每个元素。例如： List<string> list1 = new List<string>(); list1.Add("Cat"); list1.Add("Dog"); list1.Add... ~1000 items; List<string> list2 = new List<string>(); list2.Add(

我有两张清单。一个包含搜索元素，一个包含数据。我需要循环list2中包含list1中任何字符串（“cat”或“dog”）的每个元素。例如：

List<string> list1 = new List<string>();
list1.Add("Cat");
list1.Add("Dog");
list1.Add... ~1000 items;

List<string> list2 = new List<string>();
list2.Add("Gray Cat");
list2.Add("Black Cat");
list2.Add("Green Duck");
list2.Add("White Horse");
list2.Add("Yellow Dog Tasmania");
list2.Add("White Horse");
list2.Add... ~million items;

一个优秀的并行化用例

没有并行化的Linq方法（在内部等于您的方法，如果找到一个匹配，则内部循环中断-您的方法还搜索其他匹配）

运行时比较：（4个核心系统，列表1 1000个项目，列表2 1.000.000个项目）

另一种方法是使用

Parallel.ForEach

和线程安全结果列表

System.Collections.Concurrent.ConcurrentBag<string> listResult = new System.Collections.Concurrent.ConcurrentBag<string>();
System.Threading.Tasks.Parallel.ForEach<string>(list2, str2 =>
{
    foreach (string str1 in list1)
    {
        if (str2.Contains(str1))
        {
            listResult.Add(str2);
            //break the loop if one match was found to avoid duplicates and improve performance
            break;
        }
    }
});

System.Collections.Concurrent.ConcurrentBag listResult=新系统.Collections.ConcurrentBag（）；
System.Threading.Tasks.Parallel.ForEach（列表2，str2=>
{
foreach（列表1中的字符串str1）
{
if（str2.Contains（str1））
{
listreult.Add（str2）；
//如果找到一个匹配项，则中断循环以避免重复并提高性能
打破
}
}
});

旁注：您必须首先迭代list2并

break匹配后，否则您将添加两次项：
一个优秀的并行化用例
没有并行化的Linq方法（在内部等于您的方法，如果找到一个匹配，则内部循环中断-您的方法还搜索其他匹配）
运行时比较：
（4个核心系统，列表1 1000个项目，列表2 1.000.000个项目）

另一种方法是使用Parallel.ForEach
和线程安全结果列表
System.Collections.Concurrent.ConcurrentBag<string> listResult = new System.Collections.Concurrent.ConcurrentBag<string>();
System.Threading.Tasks.Parallel.ForEach<string>(list2, str2 =>
{
    foreach (string str1 in list1)
    {
        if (str2.Contains(str1))
        {
            listResult.Add(str2);
            //break the loop if one match was found to avoid duplicates and improve performance
            break;
        }
    }
});

System.Collections.Concurrent.ConcurrentBag listResult=新系统.Collections.ConcurrentBag（）；
System.Threading.Tasks.Parallel.ForEach（列表2，str2=>
{
foreach（列表1中的字符串str1）
{
if（str2.Contains（str1））
{
listreult.Add（str2）；
//如果找到一个匹配项，则中断循环以避免重复并提高性能
打破
}
}
});


旁注：您必须首先迭代list2并break匹配后，否则将添加两次项：
列表字符串不是有效解决此问题的合适数据结构
你要找的是一个or，它可以对你原来字典列表中的每个单词进行排序
目标是对列表2中的每个单词的字母进行检查，您只能进行0-26次检查
使用此数据结构，而不是在找到一个单词之前阅读一大堆单词，您将在纸质词典中查找类似的单词。这应该更快。在文本中查找语言中所有单词的应用程序使用此原则。
列表字符串不是有效解决此问题的合适数据结构
你要找的是一个or，它可以对你原来字典列表中的每个单词进行排序
目标是对列表2中的每个单词的字母进行检查，您只能进行0-26次检查
使用此数据结构，而不是在找到一个单词之前阅读一大堆单词，您将在纸质词典中查找类似的单词。这应该更快。在文本中查找语言中所有单词的应用程序使用此原则。
Contains将使用“朴素方法”进行字符串搜索。你可以通过调查来改进这一点
一种方法是为所有的搜索词创建一个通用的。然后遍历列表2中的所有项目，查看它们是否匹配
不过，这可能有点过头了。您可以首先尝试fubo提出的一些简单优化，看看这是否足够快。
Contains将使用“简单方法”进行字符串搜索。你可以通过调查来改进这一点
一种方法是为所有的搜索词创建一个通用的。然后遍历列表2中的所有项目，查看它们是否匹配
不过，这可能有点过头了。您可以首先尝试fubo提出的一些简单优化，看看这是否足够快。
因为您似乎想要匹配整个单词，所以可以使用哈希集
进行更有效的搜索，并防止多次迭代list1
和list2

HashSet<string> species =
    new HashSet<string>(list1);

List<string> result = new List<string>();
foreach (string animal in list2)
{
    if (animal.Split(' ').Any(species.Contains))
        result.Add(animal);
}

在list2
中有100万项，此算法大约需要一秒钟

现在，虽然这种方法确实有效，但它可能会产生不正确的结果。如果list1
包含Lion，则list2
中的海狮将添加到结果中，即使list1
中没有海狮。（如果在HashSet
中使用不区分大小写的StringComparer
，如下所示。）
要解决这个问题，您需要某种方法将list2
中的字符串解析为更复杂的对象Animal
。如果你能控制你的输入，那可能是一项琐碎的任务，但总的来说很难。如果您有这样做的方法，您可以使用如下解决方案：
public class Animal
{
    public string Color { get; set; }
    public string Species { get; set; }
    public string Breed { get; set; }
}

然后在哈希集中搜索物种
HashSet<string> species = new HashSet<string>
{
    "Cat",
    "Dog",
    // etc.
};

List<Animal> animals = new List<Animal>
{
    new Animal {Color = "Gray", Species = "Cat"},
    new Animal {Color = "Green", Species = "Duck"},
    new Animal {Color = "White", Species = "Horse"},
    new Animal {Color = "Yellow", Species = "Dog", Breed = "Tasmania"}
    // etc.
};

var result = animals.Where(a => species.Contains(a.Species));

HashSet-species=新的HashSet
{
“猫”，
“狗”，
//等等。
};
列出动物=新列表
{
新动物{Color=“Gray”，Species=“Cat”}，
新动物{Color=“Green”，Species=“Duck”}，
新动物{Color=“White”，Species=“Horse”}，
新动物{Color=“Yellow”，Species=“Dog”，Breed=“Tasmania”}
//等等。
};
var结果=动物。其中（a=>物种。包含（a.物种））；


请注意，HashSet
中的字符串搜索区分大小写，如果您不希望，可以提供StringComparer
作为构造函数参数：
newhashset（StringComparer.CurrentCultureInoRecase）
由于您似乎希望匹配整个单词，因此可以使用HashSet
进行更有效的搜索，并防止反复出现list1
和list2

HashSet<string> species =
    new HashSet<string>(list1);

List<string> result = new List<string>();
foreach (string animal in list2)
{
    if (animal.Split(' ').Any(species.Contains))
        result.Add(animal);
}

在<代码中有100万项
The algorithm in the question:    37    seconds
The algorithm using AsParallel:    7    seconds
This algorithm:                    0.17 seconds

public class Animal
{
    public string Color { get; set; }
    public string Species { get; set; }
    public string Breed { get; set; }
}

HashSet<string> species = new HashSet<string>
{
    "Cat",
    "Dog",
    // etc.
};

List<Animal> animals = new List<Animal>
{
    new Animal {Color = "Gray", Species = "Cat"},
    new Animal {Color = "Green", Species = "Duck"},
    new Animal {Color = "White", Species = "Horse"},
    new Animal {Color = "Yellow", Species = "Dog", Breed = "Tasmania"}
    // etc.
};

var result = animals.Where(a => species.Contains(a.Species));