C# xml中特殊字符的正则表达式模式匹配_C#_Xml

C# xml中特殊字符的正则表达式模式匹配

c# xml

C# xml中特殊字符的正则表达式模式匹配,c#,xml,C#,Xml,我试图从包含特殊字符的xml中收集所有值，因为XmlDocunemt和XDocument抛出异常读取xml在c#中包含特殊字符我得到了一个xml字符串 <root>\n\t<childone>\n\t\t<attributeone name=\"aa\">aa</attributeone>\n\t\t<attributetwo adds=\"ba\">ab&\"'<</attributetwo>\n\t\t&

我试图从包含特殊字符的xml中收集所有值，因为XmlDocunemt和XDocument抛出异常读取xml在c#中包含特殊字符

我得到了一个xml字符串

<root>\n\t<childone>\n\t\t<attributeone name=\"aa\">aa</attributeone>\n\t\t<attributetwo adds=\"ba\">ab&\"'<</attributetwo>\n\t\t<attributeone name=\"aa\">&</attributeone>\n\t</childone>\n</root>

\n\t\n\t\taa\n\t\tab&\”（[&\”我重新格式化了您的xml片段，使其更具可读性。可以清楚地看到xml无效（我们已经知道，因为XmlDocument无法解析它）。
显然，attributewo的内容应该是ab&“正如许多人可能指出的那样，使用正则表达式解析XML通常是一个坏主意，因为它很快失控，很难维护，而且通常很难预见可能出现的错误，特别是在输入发生变化的情况下。有很多编写良好的XML解析器，最好选择一个并使用它，而不是使用正则表达式来完成这项工作。Agree告诉Nit您应该尽量避免正则表达式，您说“因为XmlDocunemt和XDocument在读取xml时抛出异常”-我建议修复产生明显无效的xml输入字符串的应用程序，例如，让它正确转义特殊字符。除非普通的xml解析器可以读取xml，否则xml是没有用的。@Astrotrain我理解您描述的情况，但不幸的是，我没有太多的xml源访问权限。我只能从源代码作为输入字符串。但我不知道它是在哪里创建的，也不知道是谁创建的。
string pat = @"(>)([&\""\'<]+)(<)(/)";
Match match = Regex.Match(input, pat, RegexOptions.IgnoreCase);

<root>\n
\t<childone>\n
\t\t<attributeone name=\"aa\">aa</attributeone>\n
\t\t<attributetwo adds=\"ba\">ab&\"'<</attributetwo>\n
\t\t<attributeone name=\"aa\">&</attributeone>\n
\t</childone>\n
</root>

class Program
{
    private const string BrokenXml = 
        "<root>\n" +
        "\t<childone>\n" +
        "\t\t<attributeone name=\"aa\">aa</attributeone>\n" +
        "\t\t<attributetwo adds=\"ba\">ab&\"'<</attributetwo>\n" +
        "\t\t<attributeone name=\"aa\">&</attributeone>\n" +
        "\t<empty />\n" +
        "\t</childone>\n" +
        "</root>";

    // Matches an opening tag with 0 or more attributes, and captures everything within "<...>" as Groups[1].
    // Unescaped regex looks like: <(\w+(?:\s+\w+="[^"]*")?)>
    private static Regex OpenTagRegex = new Regex("<(\\w+(?:\\s+\\w+=\"[^\"]*\")?)>");

    // Matches a close tag and captures everything within "<...>" as Groups[1].
    private static Regex CloseTagRegex = new Regex("<(/\\w+)>");

    // Matches an empty tag and captures everything within "<...>" as Groups[1].
    private static Regex EmptyTagRegex = new Regex("<(\\w+\\s*/)>");

    public static void Main(string[] args)
    {
        //Replace the angular brackets (<>) of all valid xml elements with curly brackets ({})
        string step1 = OpenTagRegex.Replace(BrokenXml, ReplaceMatch);
        string step2 = CloseTagRegex.Replace(step1, ReplaceMatch);
        string step3 = EmptyTagRegex.Replace(step2, ReplaceMatch);

        //Fix the remaining special characters with their xml entity counterparts:
        string step4 = step3.Replace("&", "&amp;");
        string step5 = step4.Replace("<", "&lt;");
        string step6 = step5.Replace(">", "&gt;");

        //Convert from curly braces xml back to regular xml
        string result = step6.Replace("{", "<").Replace("}", ">");

        Console.WriteLine(result);

        Console.WriteLine("Press enter to exit...");
        Console.ReadLine();
    }

    /// <summary>
    /// Matches the MatchEvaluator signature.
    /// </summary>
    private static string ReplaceMatch(Match match)
    {
        string contentWithoutAngularBrackets = match.Groups[1].Value;
        return "{" + contentWithoutAngularBrackets + "}";
    }
}