C# 解析几乎格式良好的XML片段：如何跳过多个XML头我需要编写一个工具来处理下面的XML片段，因为它在流的中间包含XML声明，所以格式不好。_C#_Xml_.net 4.0_Xml Parsing_Xmlreader

C# 解析几乎格式良好的XML片段：如何跳过多个XML头我需要编写一个工具来处理下面的XML片段，因为它在流的中间包含XML声明，所以格式不好。

c# xml .net-4.0

C# 解析几乎格式良好的XML片段：如何跳过多个XML头我需要编写一个工具来处理下面的XML片段，因为它在流的中间包含XML声明，所以格式不好。,c#,xml,.net-4.0,xml-parsing,xmlreader,C#,Xml,.net 4.0,Xml Parsing,Xmlreader,该公司已经有这些类型的文件在使用很长时间，所以没有选择改变格式没有可用的源代码进行解析，新工具的选择平台是.NET 4或更高版本，最好是使用C# 以下是片段的外观： <Header> <Version>1</Version> </Header> <Entry><?xml version="1.0"?><Detail>...snip...</Detail></Entry> <En

该公司已经有这些类型的文件在使用很长时间，所以没有选择改变格式

没有可用的源代码进行解析，新工具的选择平台是.NET 4或更高版本，最好是使用C#

以下是片段的外观：

<Header>
  <Version>1</Version>
</Header>
<Entry><?xml version="1.0"?><Detail>...snip...</Detail></Entry>
<Entry><?xml version="1.0"?><Detail>...snip...</Detail></Entry>
<Entry><?xml version="1.0"?><Detail>...snip...</Detail></Entry>
<Entry><?xml version="1.0"?><Detail>...snip...</Detail></Entry>

我不认为内置的类会有帮助；您可能需要做一些准备并删除额外的标题。如果您的示例是准确的，您只需执行一个

字符串。替换（badXml，，“”）

，然后就可以开始了。

如果您不确定声明是否始终保持不变，请将

，并使用常规解析器；）

另外，您是否尝试过通过XML tidy样式的程序传递文件

可能还有一个SGML库可以用来预处理数据并输出正确的XML。

我添加了这个作为答案，因为它保留了语法突出显示

    private void ProcessFile(string inputFileName, string outputFileName)
    {
        using (StreamReader reader = new StreamReader(inputFileName, new UTF8Encoding(false, true)))
        {
            using (StreamWriter writer = new StreamWriter(outputFileName, false, Encoding.UTF8))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    const string xmlDeclarationStart = "<?xml";
                    const string xmlDeclarationFinish = "?>";
                    if (line.Contains(xmlDeclarationStart))
                    {
                        string newLine = line.Substring(0, line.IndexOf(xmlDeclarationStart));
                        int endPosition = line.IndexOf(xmlDeclarationFinish, line.IndexOf(xmlDeclarationStart));
                        if (endPosition == -1)
                        {
                            throw new NotImplementedException(string.Format("Implementation assumption is wrong. {0} .. {1} spans multiple lines (or input file is severely misformed)", xmlDeclarationStart, xmlDeclarationFinish));
                        }
                        // the code completely strips the <?xml ... ?> part
                        // an alternative would be to make this a new XML element containing
                        // the information inside the <?xml ... ?> part as attributes
                        // just like Daren Thomas suggested
                        newLine += line.Substring(endPosition + 2);
                        line = newLine;
                    }
                    writer.WriteLine(line);
                }
            }
        }
    }

private void进程文件（字符串输入文件名，字符串输出文件名）
{
使用（StreamReader=new StreamReader（inputFileName，new UTF8Encoding（false，true）））
{
使用（StreamWriter writer=newstreamwriter（outputFileName，false，Encoding.UTF8））
{
弦线；
而（（line=reader.ReadLine（））！=null）
{
常量字符串xmlDeclarationStart=“”；
if（line.Contains（xmlDeclarationStart））
{
字符串newLine=line.Substring（0，line.IndexOf（xmlDeclarationStart））；
int-endPosition=line.IndexOf（xmlDeclarationFinish，line.IndexOf（xmlDeclarationStart））；
如果（结束位置==-1）
{
抛出新的NotImplementedException（string.Format（“实现假设错误。{0}..{1}跨多行（或输入文件格式严重错误）”，xmlDeclarationStart，xmlDeclarationFinish）；
}
//代码完全剥离了该部分
//另一种方法是使其成为包含
//零件内部的信息作为属性
//就像Daren Thomas建议的那样
换行符+=行子字符串（结束位置+2）；
行=换行；
}
writer.WriteLine（行）；
}
}
}
}

您是否尝试使用System.Xml.Linq（）中的类名称空间？还没有；哪一个最适合开始解析片段？LINQ的内存需求有多大？这些文件很容易达到100 MB/块。谢谢。这与我到目前为止使用的类似，但我不确定XML声明是否保持不变。很高兴看到我们的想法是一致的。我不能同时接受answers，并添加了我自己的代码作为单独的答案，因此保留了语法突出显示。您的答案被接受，因为它与我已有的答案最接近。感谢您的答案。我知道我可以使用正则表达式，当找不到更好的替代方法时，我也会这样做。TextPad中的XMLTidy因文件太大而阻塞。有指向此类SGML库的指针吗非常感谢。我不能同时接受这两个答案，并添加了我自己的代码作为单独的答案，因此保留了语法突出显示。您的答案不被接受，因为它与我已有的答案相去甚远。抱歉（：

    private void ProcessFile(string inputFileName, string outputFileName)
    {
        using (StreamReader reader = new StreamReader(inputFileName, new UTF8Encoding(false, true)))
        {
            using (StreamWriter writer = new StreamWriter(outputFileName, false, Encoding.UTF8))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    const string xmlDeclarationStart = "<?xml";
                    const string xmlDeclarationFinish = "?>";
                    if (line.Contains(xmlDeclarationStart))
                    {
                        string newLine = line.Substring(0, line.IndexOf(xmlDeclarationStart));
                        int endPosition = line.IndexOf(xmlDeclarationFinish, line.IndexOf(xmlDeclarationStart));
                        if (endPosition == -1)
                        {
                            throw new NotImplementedException(string.Format("Implementation assumption is wrong. {0} .. {1} spans multiple lines (or input file is severely misformed)", xmlDeclarationStart, xmlDeclarationFinish));
                        }
                        // the code completely strips the <?xml ... ?> part
                        // an alternative would be to make this a new XML element containing
                        // the information inside the <?xml ... ?> part as attributes
                        // just like Daren Thomas suggested
                        newLine += line.Substring(endPosition + 2);
                        line = newLine;
                    }
                    writer.WriteLine(line);
                }
            }
        }
    }