C# 使用linq缓慢识别重复项_C#_Performance_Linq

C# 使用linq缓慢识别重复项

c# performance linq

C# 使用linq缓慢识别重复项,c#,performance,linq,C#,Performance,Linq,我有两个用户名列表，其中BigList包含20000个用户名和电子邮件，SmallList包含1500个用户名。大列表中包含重复的用户，即他们有相同的电子邮件，但用户名是唯一的。小列表具有唯一的用户名。我需要返回列表1中每个重复用户的最短用户名（由电子邮件确定），如果该用户也存在于SmallList中我已经使用linq解决了这个问题，但它需要30秒以上的时间，这太慢了： return BigList.Where(u => SmallList.Contains(u.UserName))

我有两个用户名列表，其中BigList包含20000个用户名和电子邮件，SmallList包含1500个用户名。大列表中包含重复的用户，即他们有相同的电子邮件，但用户名是唯一的。小列表具有唯一的用户名。我需要返回列表1中每个重复用户的最短用户名（由电子邮件确定），如果该用户也存在于SmallList中

我已经使用linq解决了这个问题，但它需要30秒以上的时间，这太慢了：

return BigList.Where(u => SmallList.Contains(u.UserName))
                    .OrderBy(u => u.UserName.Length)
                    .GroupBy(u => u.EmailAddress)
                    .Select(g => g.FirstOrDefault())
                    .Select(u => u.UserName).ToList();

可以做些什么来提高这个查询的性能吗？谢谢大家!

List.Contains

不能很好地扩展，因为它必须在

BigList

中的每个项目的列表中迭代最多

n次（该列表中的项目数）。考虑使用<代码> HASSET 而不是<代码>清单>代码> <代码>小列表 < /P> < P> >代码>列表。包含不能很好地扩展，因为它必须在<>代码> BigList中的每个列表中重复最多的代码<代码> n>代码>倍（列表中的项目数）。考虑使用<代码> HasSET/<代码>而不是<代码>列表>代码> <代码> StimeList
 < P>如果您将丢弃20个用户，为什么要订购18.500个用户？首先选择您想要的项目，然后按用户名长度升序排序，不是更有效吗
首先，我将把BigList转换为拥有相同电子邮件的用户组。从每个组中的所有元素中，我保留最短的用户名。显然，你对最终结果中的电子邮件不感兴趣
从剩余的用户名中，我只保留那些也在SmallList中的用户名
我使用，所以我可以操纵结果
var result = BigList.GroupBy(

   // keySelector: make groups of users with the same EmailAddress:
   user => user.EmailAddress,

   // resultSelector: from each emailAddress and all Users that have this emailAddress
   // make one new Object, the one that contains the smallest UserName
   (emailAddress, usersWithThisEmailAddress) => usersWithThisEmailAddres
       .Select(user => user.UserName)
       .OrderBy(userName => userName.Length)
       .FirstOrDefault())

// You don't want to keep all UserNames, keep only those that are also in smallList:
.Where(userName => smallList.Contains(userName));

要获得每个组中用户的最短用户名，您可以按用户名长度的升序排序，然后取第一个。但是，如果您只使用排序序列中的第一个，那么为什么要排序第二个、第三个和第54个呢
一种只需枚举序列一次的方法是鲜为人知的方法：
（电子邮件地址，userswiththismailaddress）=>userswiththismailaddres
.Select（用户=>user.UserName）
.Aggregate（（最短用户名，下一个用户名）=>
（nextUserName.Length

聚合执行如下操作
IEnumerable<string> userNames = ...
string shortestUserName = userNames.First();
foreach (string nextUserName in userNames.Skip(1))
{
    shortestUserName = (nextUserName.Length < shortestUserName.Length) ?
        nextUserName : shortestUserName;
}
return shortestUserName;

IEnumerable用户名=。。。
string shortestUserName=userNames.First（）；
foreach（用户名中的字符串nextUserName.Skip（1））
{
shortestUserName=（nextUserName.Length

事实上，通过使用GetEnumerator和MoveNext，聚合的效率甚至更高一点。这需要一点关于如何在最低级别枚举的知识，如果您对它一无所知，请不要担心，您很少需要使用它，通常只有在您想要提高性能时：
IEnumerable<string> userNames = ...
IEnumerator<string> enumerator = userNames.GetEnumerator();
if (enumerator.MoveNext())
{
    // there is at least one user name in the sequence, it is the shortest until now
    string shortestUserName = enumerator.Current;

    // while there are more userNames, check if the next one is shorter:
    while (enumerator.MoveNext())
    {
        // There is a next user name. Is it shorter?
        shortestUserName = (enumerator.Current.Length < shortestUserName.Length) ?
        enumerator.Current: shortestUserName;
    }
}
// else: there are no elements at all, decide what to do.

IEnumerable用户名=。。。
IEnumerator枚举器=用户名。GetEnumerator（）；
if（枚举数.MoveNext（））
{
//序列中至少有一个用户名，这是迄今为止最短的用户名
字符串shortestUserName=枚举数.Current；
//有更多用户名时，请检查下一个用户名是否较短：
while（枚举数.MoveNext（））
{
//有下一个用户名。它更短吗？
shortestUserName=（enumerator.Current.Length

如果您想从中挤出最后一个可能的优化：
while (enumerator.MoveNext())
{
    if (enumerator.Current.Length < shortestUserName.Length)
    {
        shortestUserName = enumerator.Current;
    }
}

while（枚举器.MoveNext（））
{
if（enumerator.Current.Length
如果你要扔掉18.500个用户，为什么要订购20000个用户？首先选择您想要的项目，然后按用户名长度升序排序，不是更有效吗
首先，我将把BigList转换为拥有相同电子邮件的用户组。从每个组中的所有元素中，我保留最短的用户名。显然，你对最终结果中的电子邮件不感兴趣
从剩余的用户名中，我只保留那些也在SmallList中的用户名
我使用，所以我可以操纵结果
var result = BigList.GroupBy(

   // keySelector: make groups of users with the same EmailAddress:
   user => user.EmailAddress,

   // resultSelector: from each emailAddress and all Users that have this emailAddress
   // make one new Object, the one that contains the smallest UserName
   (emailAddress, usersWithThisEmailAddress) => usersWithThisEmailAddres
       .Select(user => user.UserName)
       .OrderBy(userName => userName.Length)
       .FirstOrDefault())

// You don't want to keep all UserNames, keep only those that are also in smallList:
.Where(userName => smallList.Contains(userName));

要获得每个组中用户的最短用户名，您可以按用户名长度的升序排序，然后取第一个。但是，如果您只使用排序序列中的第一个，那么为什么要排序第二个、第三个和第54个呢
一种只需枚举序列一次的方法是鲜为人知的方法：
（电子邮件地址，userswiththismailaddress）=>userswiththismailaddres
.Select（用户=>user.UserName）
.Aggregate（（最短用户名，下一个用户名）=>
（nextUserName.Length

聚合执行如下操作
IEnumerable<string> userNames = ...
string shortestUserName = userNames.First();
foreach (string nextUserName in userNames.Skip(1))
{
    shortestUserName = (nextUserName.Length < shortestUserName.Length) ?
        nextUserName : shortestUserName;
}
return shortestUserName;

IEnumerable用户名=。。。
string shortestUserName=userNames.First（）；
foreach（用户名中的字符串nextUserName.Skip（1））
{
shortestUserName=（nextUserName.Length

事实上，通过使用GetEnumerator和MoveNext，聚合的效率甚至更高一点。这需要一点关于如何在最低级别枚举的知识，如果您对它一无所知，请不要担心，您很少需要使用它，通常只有在您想要提高性能时：
IEnumerable<string> userNames = ...
IEnumerator<string> enumerator = userNames.GetEnumerator();
if (enumerator.MoveNext())
{
    // there is at least one user name in the sequence, it is the shortest until now
    string shortestUserName = enumerator.Current;

    // while there are more userNames, check if the next one is shorter:
    while (enumerator.MoveNext())
    {
        // There is a next user name. Is it shorter?
        shortestUserName = (enumerator.Current.Length < shortestUserName.Length) ?
        enumerator.Current: shortestUserName;
    }
}
// else: there are no elements at all, decide what to do.

IEnumerable用户名=。。。
IEnumerator枚举器=用户名。GetEnumerator（）；
if（枚举数.MoveNext（））
{
//序列中至少有一个用户名，这是迄今为止最短的用户名
字符串shortestUserName=枚举数.Current；
//如果有更多用户名，请检查下一个