C# 在IEnumerable中查找序列<；T>；使用Linq_C#_.net_Linq

C# 在IEnumerable中查找序列<；T>；使用Linq

c# .net linq

C# 在IEnumerable中查找序列<；T>；使用Linq,c#,.net,linq,C#,.net,Linq,使用LINQ在IEnumerable中查找序列的最有效方法是什么我希望能够创建允许以下调用的扩展方法： int startIndex = largeSequence.FindSequence(subSequence) 匹配必须相邻且有序。更新：鉴于问题的澄清，我下面的回答并不适用。把它留给历史您可能想使用mySequence.Where（）。然后关键是优化谓词以在您的环境中正常工作。根据您的需求和典型的使用模式，这可能会有很大的不同很有可能，对于小集合有效的方法对于更大的集合不适用，这

使用LINQ在

IEnumerable

中查找序列的最有效方法是什么

我希望能够创建允许以下调用的扩展方法：

int startIndex = largeSequence.FindSequence(subSequence)

匹配必须相邻且有序。

更新： 鉴于问题的澄清，我下面的回答并不适用。把它留给历史

您可能想使用mySequence.Where（）。然后关键是优化谓词以在您的环境中正常工作。根据您的需求和典型的使用模式，这可能会有很大的不同

很有可能，对于小集合有效的方法对于更大的集合不适用，这取决于t的类型

当然，如果90%的使用是针对小集合的，那么针对离群值大集合的优化似乎有点雅格尼。

您所说的希望能够使用的代码不是LINQ，所以我不明白为什么需要使用LINQ来实现它

这与子字符串搜索基本相同（实际上，顺序重要的枚举是“字符串”的泛化）

由于计算机科学长期以来经常考虑这个问题，所以你可以站在巨人的肩膀上

一些合理的起点是：

即使只是维基百科文章中的伪代码也足以很容易地移植到C。查看不同情况下的性能描述，并确定代码最可能遇到的情况。

下面是一个算法的实现，该算法可以在序列中查找子序列。我将该方法称为

IndexOfSequence

，因为它使意图更加明确，并且类似于现有的

IndexOf

方法：

public static class ExtensionMethods
{
    public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence)
    {
        return source.IndexOfSequence(sequence, EqualityComparer<T>.Default);
    }

    public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence, IEqualityComparer<T> comparer)
    {
        var seq = sequence.ToArray();

        int p = 0; // current position in source sequence
        int i = 0; // current position in searched sequence
        var prospects = new List<int>(); // list of prospective matches
        foreach (var item in source)
        {
            // Remove bad prospective matches
            prospects.RemoveAll(k => !comparer.Equals(item, seq[p - k]));

            // Is it the start of a prospective match ?
            if (comparer.Equals(item, seq[0]))
            {
                prospects.Add(p);
            }

            // Does current character continues partial match ?
            if (comparer.Equals(item, seq[i]))
            {
                i++;
                // Do we have a complete match ?
                if (i == seq.Length)
                {
                    // Bingo !
                    return p - seq.Length + 1;
                }
            }
            else // Mismatch
            {
                // Do we have prospective matches to fall back to ?
                if (prospects.Count > 0)
                {
                    // Yes, use the first one
                    int k = prospects[0];
                    i = p - k + 1;
                }
                else
                {
                    // No, start from beginning of searched sequence
                    i = 0;
                }
            }
            p++;
        }
        // No match
        return -1;
    }
}

公共静态类扩展方法
{
公共静态int IndexOfSequence（此IEnumerable源，IEnumerable序列）
{
返回source.IndexOfSequence（sequence，EqualityComparer.Default）；
}
公共静态int IndexOfSequence（此IEnumerable源、IEnumerable序列、IEqualityComparer比较器）
{
var seq=sequence.ToArray（）；
int p=0；//源序列中的当前位置
int i=0；//搜索序列中的当前位置
var prospects=new List（）；//预期匹配的列表
foreach（源中的var项）
{
//删除不好的预期匹配
prospects.RemoveAll（k=>！comparer.Equals（item，seq[p-k]）；
//这是未来比赛的开始吗？
if（比较器等于（项，序号[0]））
{
增加（p）；
}
//当前角色是否继续部分匹配？
if（比较器等于（项目，序号[i]））
{
i++；
//我们有完整的比赛吗？
如果（i==序列长度）
{
//宾果！
返回p-序列长度+1；
}
}
else//不匹配
{
//我们有没有潜在的比赛可以依靠？
如果（prospects.Count>0）
{
//是的，用第一个
int k=潜在客户[0]；
i=p-k+1；
}
其他的
{
//否，从搜索序列的开头开始
i=0；
}
}
p++；
}
//没有对手
返回-1；
}
}

我没有完全测试它，所以它可能仍然包含bug。我只是对一些著名的街角案例做了一些测试，以确保我没有落入明显的陷阱。到目前为止似乎还不错

我认为复杂性接近O（n），但我不是大O符号的专家，所以我可能错了。。。至少它只列举了一次源序列，而没有返回，所以它应该是相当有效的。

我知道这是一个老问题，但我需要这个精确的方法，我这样写了它：

public static int ContainsSubsequence<T>(this IEnumerable<T> elements, IEnumerable<T> subSequence) where T: IEquatable<T>
{
    return ContainsSubsequence(elements, 0, subSequence);
}

private static int ContainsSubsequence<T>(IEnumerable<T> elements, int index, IEnumerable<T> subSequence) where T: IEquatable<T>
{
    // Do we have any elements left?
    bool elementsLeft = elements.Any();

    // Do we have any of the sub-sequence left?
    bool sequenceLeft = subSequence.Any();

    // No elements but sub-sequence not fully matched
    if (!elementsLeft && sequenceLeft)
        return -1; // Nope, didn't match

    // No elements of sub-sequence, which means even if there are
    // more elements, we matched the sub-sequence fully
    if (!sequenceLeft)
        return index - subSequence.Count(); // Matched!

    // If we didn't reach a terminal condition,
    // check the first element of the sub-sequence against the first element
    if (subSequence.First().Equals(e.First()))
        // Yes, it matched - move onto the next. Consume (skip) one element in each
        return ContainsSubsequence(elements.Skip(1), index + 1 subSequence.Skip(1));
    else
        // No, it didn't match. Try the next element, without consuming an element
        // from the sub-sequence
        return ContainsSubsequence(elements.Skip(1), index + 1, subSequence);
}

public static int包含序列（这是IEnumerable元素，IEnumerable子序列），其中T:IEquatable
{
返回ContainsSubsequence（元素，0，子序列）；
}
私有静态int包含序列（IEnumerable元素、int索引、IEnumerable子序列），其中T:IEquatable
{
//我们还有什么元素吗？
bool elementsLeft=elements.Any（）；
//我们还有子序列吗？
bool sequenceLeft=subSequence.Any（）；
//无元素，但子序列未完全匹配
if（！elementsLeft&&sequenceLeft）
return-1；//否，不匹配
//没有子序列的元素，这意味着即使有
//更多元素，我们完全匹配子序列
如果（！sequenceLeft）
返回索引-subSequence.Count（）；//匹配！
//如果我们没有到达终点，
//对照第一个元素检查子序列的第一个元素
if（子序列.First（）.Equals（e.First（）））
//是的，它匹配-移动到下一个。在每个元素中消耗（跳过）一个元素
返回ContainsSubsequence（elements.Skip（1），index+1子序列.Skip（1））；
其他的
//不，不匹配。请尝试下一个元素，但不使用任何元素
//从子序列
返回ContainsSubsequence（元素.跳过（1），索引+1，子序列）；
}

更新后，不仅返回子序列是否匹配，还返回它在原始序列中的起始位置

这是IEnumerable上的一个扩展方法，完全懒惰，提前终止，并且比当前投票结果更为明确。然而，bewarning（正如@wai-ha-lee指出的）它是递归的，并且创建了一批枚举数。在适用的情况下使用它（性能/内存）。这很适合我的需要，但是YMMV。

你可以

int startIndex = largeSequence.AsSequence().IndexOfSlice(subSequence);