Vb.net 删除字典中的重复项_Vb.net

Vb.net 删除字典中的重复项

vb.net

Vb.net 删除字典中的重复项,vb.net,Vb.net,嗨，我有一个字典，里面填充了与正则表达式匹配的实体。它正确地提取所有数据，但也会带来重复数据。如何防止重复数据进入这是我的密码 Dim largeFilePath As String = newMasterFilePath Dim lines1 = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTI

嗨，我有一个字典，里面填充了与正则表达式匹配的实体。它正确地提取所有数据，但也会带来重复数据。如何防止重复数据进入

这是我的密码

    Dim largeFilePath As String = newMasterFilePath
    Dim lines1 = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
    Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
    Dim entities = From line In lines1
                   Where reg.IsMatch(line)

    Dim dictionary As New Dictionary(Of Integer, String)
    Dim idx = -1
    For Each s In entities
        idx = lines1.IndexOf(s, idx + 1)
        dictionary.Add(idx, s)
    Next

    Dim deletedItems = 0
    For Each itm In dictionary
        lines1.RemoveAt(itm.Key - deletedItems)
        deletedItems += 1
    Next

    For Each s In dictionary.Values
        lines1.Insert(1, s)
    Next

Dim largeFilePath As String=newMasterFilePath
Dim lines1=File.ReadLines（大文件路径）。ToList“不使用ReadAllLines”
Dim reg=New Regex（“\将此行dictionary.Add（idx，s）
更改为dictionary.Add（idx，s.Trim）

然后：
Dim uniqueDict = dictionary.GroupBy(Function(itm) itm.Value).
                   Select(Function(group) group.First()).
                   ToDictionary(Function(itm) itm.Key, Function(itm) itm.Value)

For Each s In uniqueDict.Values
     lines.Insert(1, s)
Next

结果显示，所有重复项均已删除：
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<!ENTITY cdcs_3-5.wmf SYSTEM "graphics\CDCS_3-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-19.wmf SYSTEM "graphics\CDCS_2-19.wmf" NDATA wmf>
<!NOTATION png SYSTEM "png">
<!NOTATION svg SYSTEM "svg">
<!NOTATION bmp SYSTEM "bmp">
<!ENTITY cdcs_2-2a.wmf SYSTEM "graphics\CDCS_2-2A.wmf" NDATA wmf>
<!ENTITY cdcs_5-35.wmf SYSTEM "graphics\CDCS_5-35.wmf" NDATA wmf>
<doc service="xs" docid="BKw46" docstat="formal" verstatpg="ver" cycle="1" chglevel="1">
<front numcols="1">
<idinfo>
<?Pub Lcl _divid="100" _parentid="0">
<tmidno>Life with Pets</tmidno>
<chgnum>Change 1</chgnum>
<chgdate>2 August 2018</chgdate>
<chghistory>
<chginfo>
<chgtxt>Change 1</chgtxt>
<date>2 August 2018</date>
</front>
<body numcols="1">
<chapter>
<title>This is chapter 1</title>
<para0>
<title>Climb the ladder immedietly</title>
<para>Retrieve the cat.</para></para0></chapter>
<chapter>
<title>Don't forget to feed the dog</title>
<para0>
<title>Prep for puppies</title>
<para>Puppies are cute</para></para0>
</chapter>
</body>
</doc>


或者，在添加之前检查是否存在重复项：
For Each s In entities
    If Not dictionary.TryGetValue(lines1.IndexOf(s, idx + 1), s) Then
        idx = lines1.IndexOf(s, idx + 1)
        dictionary.Add(idx, s)
    End If
Next

这应该对你有帮助。你打算先删除匹配行，然后再将唯一值移到顶部吗？修改linq查询怎么样，从那里开始，所有其他代码都可以删除…@MuhammadAlnahrawy我想它会将值放在顶部，然后删除重复的值。或者你建议这样做我喜欢所有的@MaxineHammett，它对我有效，我得到了预期的结果，你在这里的示例文本中尝试了吗？我认为我的代码中没有正确的代码。你能更新你的代码，显示它在我的代码中的位置吗？请检查我的结果。我相信你的结果。我只是不知道代码放在哪里。需要帮助吗？请我获取错误“已经添加了一个具有相同键的项目”这是行1.IndexOf（s，idx+1）部分不正确。我不知道如何在不破坏代码的情况下将其正确放置，所以猜测了一点。在查找重复项和将项目添加到集合中时必须相同。
For Each s In entities
    If Not dictionary.TryGetValue(lines1.IndexOf(s, idx + 1), s) Then
        idx = lines1.IndexOf(s, idx + 1)
        dictionary.Add(idx, s)
    End If
Next