C# 如何比较两个字符串数组,找到所有连续的匹配项并保存索引?
例如,如果我有以下两个阵列:C# 如何比较两个字符串数组,找到所有连续的匹配项并保存索引?,c#,arrays,C#,Arrays,例如,如果我有以下两个阵列: string[] userSelect = new string[] {"the", "quick", "brown", "dog", "jumps", "over"}; string[] original = new string[] {"the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"}; 我试图将userSelect数组与原始数组进行比较,并根据索引获取所有连续的匹配。u
string[] userSelect = new string[] {"the", "quick", "brown", "dog", "jumps", "over"};
string[] original = new string[] {"the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"};
我试图将userSelect数组与原始数组进行比较,并根据索引获取所有连续的匹配。userSelect数组将始终由原始数组中的字符串组成。因此,输出如下所示:
int[] match0 = new int[] {0, 1, 2}; // indices for "the quick brown"
int[] match2 = new int[] {4, 5}; // indices for "jumps over"
int[] match1 = new int[] {3}; // index for "dog"
userSelect数组长度永远不会超过原始数组长度,但是它可以更短,并且单词可以按任何顺序排列。我该怎么做呢?这并不能完全满足您的要求,但这是一种非常干净、简单的方法,可以获得包含所有公共字符串的新数组(即取两个数组的交点) 执行
resultls
后,数组将包含array1
和array2
中出现的所有字符串(忽略大小写)
如果你想了解一些理论,那么intersect方法是基于你在lambda演算中对集合所做的交集运算。C#中的集合实现了所有常见的集合操作,因此有必要对它们进行一些熟悉。这里有一个维基文章的链接 这是我想到的
var matches =
(from l in userSelect.Select((s, i) => new { s, i })
join r in original.Select((s, i) => new { s, i })
on l.s equals r.s
group l by r.i - l.i into g
from m in g.Select((l, j) => new { l.i, j = l.i - j, k = g.Key })
group m by new { m.j, m.k } into h
select h.Select(t => t.i).ToArray())
.ToArray();
这将输出
matches[0] // { 0, 1, 2 } the quick brown
matches[1] // { 4, 5 } jumps over
matches[2] // { 0 } the
matches[3] // { 3 } dog
使用输入{“the”,“quick”,“brown”,“the”,“lazy”,“dog”}
生成:
matches[0] // { 0, 1, 2 } the quick brown
matches[1] // { 0 } the
matches[2] // { 3 } the
matches[3] // { 3, 4, 5 } the lazy dog
请注意,对ToArray
的调用是可选的。如果您实际上不需要数组中的结果,可以省去它,节省一点处理时间
要筛选出与其他较大序列完全包含在一起的任何序列,可以运行以下代码(请注意修改后的查询中的orderby
):
如果单词不能重复,这会更容易 一般的想法是从原始单词列表中创建一个
词典
。这将告诉你在什么位置使用哪些单词。您的示例词典如下:
key="the", value={0, 6}
key="quick", value={1}
key="brown", value={2}
... etc
现在,当您获得用户的输入时,您可以按顺序逐步完成它,在字典中查找单词以获得位置列表
所以你查一个单词,它就在字典里。保存从字典返回的位置。查下一个单词。您需要处理三个条件:
var index = 0;
var lookup = original.ToLookup(s => s, s => index++);
怪物
用
foreach (var occurrence in occurrences) {
Console.WriteLine(
"Maximal match starting with '{0}': [{1}]",
userSelect[occurrence[0]],
string.Join(", ", occurrence)
);
}
给予
很明显,您不希望在生产中使用此代码,,到目前为止,另一种(过程性)解决方案更可取。但是,此解决方案的区别在于,除了查找
,它是纯功能的。当然,也可以从功能上写:
var lookup = original.Select((s, i) => Tuple.Create)
.ToLookup(t => t.Item1, t => t.Item2);
工作原理
预热时,它会创建一个类似字典的结构,将原始
中的每个单词与它出现在同一集合中的索引相关联。这将在以后用于从userSelect
中的每个单词创建尽可能多的匹配序列(例如,“the”将产生两个匹配序列,因为它在original
中出现两次)
然后:
这很容易,因为它将删除userSelect
中未出现在原始版本中的所有单词
// For each place where the word s appears in original...
.SelectMany((s, i) => lookup[s]
// Define the two subsequences of userSelect and original to work on.
// We are trying to find the number of identical elements until first mismatch.
.Select(j => new { User = userSelect.Skip(i), Original = original.Skip(j), Skipped = j })
// Use .Zip to find this subsequence
.Select(t => t.User.Zip(t.Original, (u, v) => Tuple.Create(u, v, t.Skipped)).TakeWhile(tuple => tuple.Item1 == tuple.Item2))
// Note the index in original where the subsequence started and its length
.Select(u => new { Word = s, Start = u.Select(v => v.Item3).Min(), Length = u.Count() })
)
此时,我们已将userSelect
中的每个匹配单词投影到具有Start
和Length
属性的匿名对象。然而,匹配长度为N的序列也将导致长度为N-1、N-2、。。。一,
这里的关键是要认识到,对于这些集合中的所有子序列,Start+Length
将是相同的;此外,来自不同集合的子序列将具有不同的Start+Length
之和。因此,让我们利用这个优势来缩减结果:
// Obvious from the above
.GroupBy(v => v.Start + v.Length)
// We want to keep the longest subsequence. Since Start + Length is constant for
// all, it follows the one with the largest Length has the smallest Start:
.Select(g => g.OrderBy(u => u.Start).First())
这仍然会使我们在userSelect
中的每个单词的匹配次数与该单词在original
中出现的次数相同。因此,让我们将其缩减为最长的比赛:
.GroupBy(v => v.Word)
.Select(g => g.OrderByDescending(u => u.Length).First())
我们现在有了一个类似于{Word=“the”,Start=0,Length=3}
的对象。让我们将其转换为userSelect
中的索引数组:
.Select(w => Enumerable.Range(w.Start, w.Length).ToArray())
最后将所有这些阵列放在同一个集合中并完成任务 这不是很优雅,但效率很高。在索引方面,Linq通常比简单循环更复杂、效率更低
string[] userSelect = new string[] { "the", "quick", "brown", "dog", "jumps", "over" };
string[] original = new string[] { "the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog" };
var consecutiveGroups = new Dictionary<int, IList<string>>();
IList<Tuple<int, string>> uniques = new List<Tuple<int, string>>();
int maxIndex = Math.Min(userSelect.Length, original.Length);
if (maxIndex > 0)
{
int minIndex = 0;
int lastMatch = int.MinValue;
for (int i = 0; i < maxIndex; i++)
{
var us = userSelect[i];
var o = original[i];
if (us == o)
{
if (lastMatch == i - 1)
consecutiveGroups[minIndex].Add(us);
else
{
minIndex = i;
consecutiveGroups.Add(minIndex, new List<string>() { us });
}
lastMatch = i;
}
else
uniques.Add(Tuple.Create(i, us));
}
}
你有试过什么吗?这似乎不太复杂。我试过一点,但并不像我想象的那么简单,因为某些单词可以多次使用,我正在寻找最长的连续比赛。例如,在上面的句子中,“the”可以在句子中出现两次,并且必须同时进行检查。您可以将数组转换为分隔字符串,并使用解决该问题的任何算法来解决问题。我从你的评论中推断,你真的只是想找到最长的一个,而不是每一个梳子
// For each place where the word s appears in original...
.SelectMany((s, i) => lookup[s]
// Define the two subsequences of userSelect and original to work on.
// We are trying to find the number of identical elements until first mismatch.
.Select(j => new { User = userSelect.Skip(i), Original = original.Skip(j), Skipped = j })
// Use .Zip to find this subsequence
.Select(t => t.User.Zip(t.Original, (u, v) => Tuple.Create(u, v, t.Skipped)).TakeWhile(tuple => tuple.Item1 == tuple.Item2))
// Note the index in original where the subsequence started and its length
.Select(u => new { Word = s, Start = u.Select(v => v.Item3).Min(), Length = u.Count() })
)
// Obvious from the above
.GroupBy(v => v.Start + v.Length)
// We want to keep the longest subsequence. Since Start + Length is constant for
// all, it follows the one with the largest Length has the smallest Start:
.Select(g => g.OrderBy(u => u.Start).First())
.GroupBy(v => v.Word)
.Select(g => g.OrderByDescending(u => u.Length).First())
.Select(w => Enumerable.Range(w.Start, w.Length).ToArray())
string[] userSelect = new string[] { "the", "quick", "brown", "dog", "jumps", "over" };
string[] original = new string[] { "the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog" };
var consecutiveGroups = new Dictionary<int, IList<string>>();
IList<Tuple<int, string>> uniques = new List<Tuple<int, string>>();
int maxIndex = Math.Min(userSelect.Length, original.Length);
if (maxIndex > 0)
{
int minIndex = 0;
int lastMatch = int.MinValue;
for (int i = 0; i < maxIndex; i++)
{
var us = userSelect[i];
var o = original[i];
if (us == o)
{
if (lastMatch == i - 1)
consecutiveGroups[minIndex].Add(us);
else
{
minIndex = i;
consecutiveGroups.Add(minIndex, new List<string>() { us });
}
lastMatch = i;
}
else
uniques.Add(Tuple.Create(i, us));
}
}
var consecutiveGroupsIndices = consecutiveGroups
.OrderByDescending(kv => kv.Value.Count)
.Select(kv => Enumerable.Range(kv.Key, kv.Value.Count).ToArray()
.ToArray());
foreach(var consIndexGroup in consecutiveGroupsIndices)
Console.WriteLine(string.Join(",", consIndexGroup));
Console.WriteLine(string.Join(",", uniques.Select(u => u.Item1)));