C# 大型XML文件,XmlDocument不可行,但需要能够搜索

C# 大型XML文件,XmlDocument不可行,但需要能够搜索,c#,xml,mobile,compact-framework,xmltextreader,C#,Xml,Mobile,Compact Framework,Xmltextreader,我正在努力解决一个合理的逻辑循环,从一个XML文件中剥离节点,这个XML文件太大,无法与支持XPath的.NET类一起使用 我正试图用使用XmTextReader的代码替换我的一行代码(用XPath查询字符串调用SelectNodes) 正如前面使用的XPath查询(仅供参考)所示,我必须降低几个级别: 我觉得这很烦人,但很简单。然而,我似乎无法正确地进行循环 我需要获取一个节点,检查该节点下的一个节点,查看该值是否与目标字符串匹配,如果匹配,则进一步向下遍历,如果不匹配,则跳过该分支 事实上,

我正在努力解决一个合理的逻辑循环,从一个XML文件中剥离节点,这个XML文件太大,无法与支持XPath的.NET类一起使用

我正试图用使用XmTextReader的代码替换我的一行代码(用XPath查询字符串调用SelectNodes)

正如前面使用的XPath查询(仅供参考)所示,我必须降低几个级别:

我觉得这很烦人,但很简单。然而,我似乎无法正确地进行循环

我需要获取一个节点,检查该节点下的一个节点,查看该值是否与目标字符串匹配,如果匹配,则进一步向下遍历,如果不匹配,则跳过该分支

事实上,我认为我的问题是,如果我不熟悉某个分支,我不知道如何忽略它。我不能允许它遍历不相关的分支,因为元素名称不是唯一的(如XPath查询所示)

我想我可以维护一些布尔值,例如bool expectingProfileName,当我点击一个Profile节点时,它被设置为true。但是,如果它不是我想要的特定概要文件节点,我就无法离开该分支

所以…希望这对某人来说是有意义的…我已经盯着这个问题看了几个小时,可能只是错过了一些明显的东西

我想把文件的一部分贴上去,但我不知道它的结构大致是怎样的:

ConfigRelease > Profiles > Profile > Name > Screens > Screen > Settings > Setting > Name
我将知道ProfileName、ScreenName和SettingName,我需要设置节点

我试图避免在一次点击中读取整个文件,例如在应用程序启动时,因为其中一半的内容永远不会被使用。我也无法控制生成xml文件的内容,因此无法将其更改为生成多个较小的文件

任何提示都将不胜感激

更新

我重新打开了这个。一张海报建议使用本应完美的XPathDocument。不幸的是,我没有提到这是一个移动应用程序,不支持XPathDocument

按照大多数标准,该文件并不大,这就是系统最初被编码为使用XmlDocument的原因。它目前的容量为4MB,显然大到足以在移动应用程序加载到XmlDocument时崩溃。它可能只是因为文件被执行以变得更大而出现。不管怎么说,我现在正在尝试数据集的建议,但仍然对其他想法持开放态度

更新2


我有点怀疑,因为很多人都说他们不希望这么大的文件会使系统崩溃。进一步的实验表明,这是一次间歇性碰撞。昨天它每次都死机,但今天早上在我重置设备后,我无法复制它。我现在正试图找出一套可靠的繁殖步骤。还要决定处理这个问题的最佳方法,我相信这个问题仍然存在。我不能就这样离开它,因为如果应用程序无法访问此文件,它将毫无用处,而且我认为当我的应用程序正在运行时,我无法告诉我的用户他们无法在设备上运行任何其他内容……

尝试将文件加载到数据集中:

DataSet ds = new Dataset();
ds.ReadXml("C:\MyXmlFile.xml")
然后您可以使用linq来搜索它。

看看


XPathDocument比XmlDocument更轻,并且针对只读XPath查询进行了优化。

好的,我对此很感兴趣,所以我一起编写了一些代码。它并不漂亮,只真正支持这一个用例,但我认为它完成了您正在寻找的工作,并作为一个体面的平台开始。我也没有彻底测试过。 最后,您需要修改代码以使其返回内容(请参阅名为Output()的方法)

代码如下:

using System;

using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;

namespace XPathInCE
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                if (args.Length != 2)
                {
                    ShowUsage();
                }
                else
                {
                    Extract(args[0], args[1]);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("{0} was thrown", ex.GetType());
                Console.WriteLine(ex.Message);
                Console.WriteLine(ex.StackTrace);
            }

            Console.WriteLine("Press ENTER to exit");
            Console.ReadLine();
        }

        private static void Extract(string filePath, string queryString)
        {
            if (!File.Exists(filePath))
            {
                Console.WriteLine("File not found! Path: {0}", filePath);
                return;
            }

            XmlReaderSettings settings = new XmlReaderSettings { IgnoreComments = true, IgnoreWhitespace = true };
            using (XmlReader reader = XmlReader.Create(filePath, settings))
            {
                XPathQuery query = new XPathQuery(queryString);
                query.Find(reader);
            }
        }

        static void ShowUsage()
        {
            Console.WriteLine("No file specified or incorrect number of parameters");
            Console.WriteLine("Args must be: Filename XPath");
            Console.WriteLine();
            Console.WriteLine("Sample usage:");
            Console.WriteLine("XPathInCE someXmlFile.xml ConfigurationRelease/Profiles/Profile[Name='MyProfileName']/Screens/Screen[Id='MyScreenId']/Settings/Setting[Name='MySettingName']");
        }

        class XPathQuery
        {
            private readonly LinkedList<ElementOfInterest> list = new LinkedList<ElementOfInterest>();
            private LinkedListNode<ElementOfInterest> currentNode;

            internal XPathQuery(string query)
            {
                Parse(query);
                currentNode = list.First;
            }

            internal void Find(XmlReader reader)
            {
                bool skip = false;
                while (true)
                {
                    if (skip)
                    {
                        reader.Skip();
                        skip = false;
                    }
                    else
                    {
                        if (!reader.Read())
                        {
                            break;
                        }
                    }
                    if (reader.NodeType == XmlNodeType.EndElement
                            && String.Compare(reader.Name, currentNode.Previous.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) == 0)
                    {
                        currentNode = currentNode.Previous ?? currentNode;
                        continue;
                    }
                    if (reader.NodeType == XmlNodeType.Element)
                    {
                        string currentElementName = reader.Name;
                        Console.WriteLine("Considering element: {0}", reader.Name);

                        if (String.Compare(reader.Name, currentNode.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) != 0)
                        {
                            // don't want
                            Console.WriteLine("Skipping");
                            skip = true;
                            continue;
                        }
                        if (!FindAttributes(reader))
                        {
                            // don't want
                            Console.WriteLine("Skipping");
                            skip = true;
                            continue;
                        }

                        // is there more?
                        if (currentNode.Next != null)
                        {
                            currentNode = currentNode.Next;
                            continue;
                        }

                        // we're at the end, this is a match! :D
                        Console.WriteLine("XPath match found!");
                        Output(reader, currentElementName);
                    }
                }
            }

            private bool FindAttributes(XmlReader reader)
            {
                foreach (AttributeOfInterest attributeOfInterest in currentNode.Value.Attributes)
                {
                    if (String.Compare(reader.GetAttribute(attributeOfInterest.Name), attributeOfInterest.Value,
                                       StringComparison.CurrentCultureIgnoreCase) != 0)
                    {
                        return false;
                    }
                }
                return true;
            }

            private static void Output(XmlReader reader, string name)
            {
                while (reader.Read())
                {
                    // break condition
                    if (reader.NodeType == XmlNodeType.EndElement
                        && String.Compare(reader.Name, name, StringComparison.CurrentCultureIgnoreCase) == 0)
                    {
                        return;
                    }

                    if (reader.NodeType == XmlNodeType.Element)
                    {
                        Console.WriteLine("Element {0}", reader.Name);
                        Console.WriteLine("Attributes");
                        for (int i = 0; i < reader.AttributeCount; i++)
                        {
                            reader.MoveToAttribute(i);
                            Console.WriteLine("Attribute: {0} Value: {1}", reader.Name, reader.Value);
                        }
                    }

                    if (reader.NodeType == XmlNodeType.Text)
                    {
                        Console.WriteLine("Element value: {0}", reader.Value);
                    }
                }
            }

            private void Parse(string query)
            {
                IList<string> elements = query.Split('/');
                foreach (string element in elements)
                {
                    ElementOfInterest interestingElement = null;
                    string elementName = element;
                    int attributeQueryStartIndex = element.IndexOf('[');
                    if (attributeQueryStartIndex != -1)
                    {
                        int attributeQueryEndIndex = element.IndexOf(']');
                        if (attributeQueryEndIndex == -1)
                        {
                            throw new ArgumentException(String.Format("Argument: {0} has a [ without a corresponding ]", query));
                        }
                        elementName = elementName.Substring(0, attributeQueryStartIndex);
                        string attributeQuery = element.Substring(attributeQueryStartIndex + 1,
                                    (attributeQueryEndIndex - attributeQueryStartIndex) - 2);
                        string[] keyValPair = attributeQuery.Split('=');
                        if (keyValPair.Length != 2)
                        {
                            throw new ArgumentException(String.Format("Argument: {0} has an attribute query that either has too many or insufficient = marks. We currently only support one", query));
                        }
                        interestingElement = new ElementOfInterest(elementName);
                        interestingElement.Add(new AttributeOfInterest(keyValPair[0].Trim().Replace("'", ""),
                            keyValPair[1].Trim().Replace("'", "")));
                    }
                    else
                    {
                        interestingElement = new ElementOfInterest(elementName);
                    }

                    list.AddLast(interestingElement);
                }
            }

            class ElementOfInterest
            {
                private readonly string elementName;
                private readonly List<AttributeOfInterest> attributes = new List<AttributeOfInterest>();

                public ElementOfInterest(string elementName)
                {
                    this.elementName = elementName;
                }

                public string ElementName
                {
                    get { return elementName; }
                }

                public List<AttributeOfInterest> Attributes
                {
                    get { return attributes; }
                }

                public void Add(AttributeOfInterest attribute)
                {
                    Attributes.Add(attribute);
                }
            }

            class AttributeOfInterest
            {
                private readonly string name;
                private readonly string value;

                public AttributeOfInterest(string name, string value)
                {
                    this.name = name;
                    this.value = value;
                }

                public string Value
                {
                    get { return value; }
                }

                public string Name
                {
                    get { return name; }
                }
            }
        }
    }
}
我在桌面上运行它,但它是我生成的CF 2.00.exe,所以它在CE上应该可以正常工作。 正如您所看到的,它在不匹配时会跳过,因此不会遍历整个文件


任何人的反馈都是值得赞赏的,特别是如果人们有使代码更简洁的指针。

我添加这一点,因为问题现在已经解决了,但所选的解决方案与目前列出的任何内容都不匹配

我们的技术架构师处理了这个问题,并决定我们根本不应该实现Xml。这一决定部分是由于这个问题,但也由于一些关于数据传输费用水平的投诉

他的结论是,我们应该实现一种定制的文件格式(带索引),并针对查询的大小和速度进行优化

因此,在该工作得到批准和适当规范之前,该问题将被搁置


现在到此为止。

将其加载到数据集中是行不通的-这将占用更多内存

当遇到类似的问题时,我使用XmlReader并在加载时构建内存索引。我给出了索引,然后当用户单击链接或激活搜索时,我再次使用XmlReader重新读取XML文档,并加载相应的子集


这听起来很费劲,我想在某些方面确实如此。它用CPU周期换取内存。但它是有效的,而且应用程序的响应速度足够快。数据大小只有2mb,没有那么大。但我用数据集得到了OOM。然后我转到XmlSerializer,它工作了一段时间,但我再次遇到了一个OOM。所以我最终回到了这个自定义索引的事情上

您可以实现一个基于sax的解析器,这样在解析XML时,您只需要使用感兴趣的分支。这是最好的方法,因为它不会将整个xml作为文档加载

最理想的情况是,您可以根据需要设计自定义解析器,并在一次传递中对所有内容执行所有解析。例如,如果您以后对特定节点感兴趣,请保存对它们的引用,以便以后可以从那里开始,而不是重新进行解析或遍历

这里的缺点是它有点自定义编程

好处是你只会阅读你感兴趣的东西
using System;

using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;

namespace XPathInCE
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                if (args.Length != 2)
                {
                    ShowUsage();
                }
                else
                {
                    Extract(args[0], args[1]);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("{0} was thrown", ex.GetType());
                Console.WriteLine(ex.Message);
                Console.WriteLine(ex.StackTrace);
            }

            Console.WriteLine("Press ENTER to exit");
            Console.ReadLine();
        }

        private static void Extract(string filePath, string queryString)
        {
            if (!File.Exists(filePath))
            {
                Console.WriteLine("File not found! Path: {0}", filePath);
                return;
            }

            XmlReaderSettings settings = new XmlReaderSettings { IgnoreComments = true, IgnoreWhitespace = true };
            using (XmlReader reader = XmlReader.Create(filePath, settings))
            {
                XPathQuery query = new XPathQuery(queryString);
                query.Find(reader);
            }
        }

        static void ShowUsage()
        {
            Console.WriteLine("No file specified or incorrect number of parameters");
            Console.WriteLine("Args must be: Filename XPath");
            Console.WriteLine();
            Console.WriteLine("Sample usage:");
            Console.WriteLine("XPathInCE someXmlFile.xml ConfigurationRelease/Profiles/Profile[Name='MyProfileName']/Screens/Screen[Id='MyScreenId']/Settings/Setting[Name='MySettingName']");
        }

        class XPathQuery
        {
            private readonly LinkedList<ElementOfInterest> list = new LinkedList<ElementOfInterest>();
            private LinkedListNode<ElementOfInterest> currentNode;

            internal XPathQuery(string query)
            {
                Parse(query);
                currentNode = list.First;
            }

            internal void Find(XmlReader reader)
            {
                bool skip = false;
                while (true)
                {
                    if (skip)
                    {
                        reader.Skip();
                        skip = false;
                    }
                    else
                    {
                        if (!reader.Read())
                        {
                            break;
                        }
                    }
                    if (reader.NodeType == XmlNodeType.EndElement
                            && String.Compare(reader.Name, currentNode.Previous.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) == 0)
                    {
                        currentNode = currentNode.Previous ?? currentNode;
                        continue;
                    }
                    if (reader.NodeType == XmlNodeType.Element)
                    {
                        string currentElementName = reader.Name;
                        Console.WriteLine("Considering element: {0}", reader.Name);

                        if (String.Compare(reader.Name, currentNode.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) != 0)
                        {
                            // don't want
                            Console.WriteLine("Skipping");
                            skip = true;
                            continue;
                        }
                        if (!FindAttributes(reader))
                        {
                            // don't want
                            Console.WriteLine("Skipping");
                            skip = true;
                            continue;
                        }

                        // is there more?
                        if (currentNode.Next != null)
                        {
                            currentNode = currentNode.Next;
                            continue;
                        }

                        // we're at the end, this is a match! :D
                        Console.WriteLine("XPath match found!");
                        Output(reader, currentElementName);
                    }
                }
            }

            private bool FindAttributes(XmlReader reader)
            {
                foreach (AttributeOfInterest attributeOfInterest in currentNode.Value.Attributes)
                {
                    if (String.Compare(reader.GetAttribute(attributeOfInterest.Name), attributeOfInterest.Value,
                                       StringComparison.CurrentCultureIgnoreCase) != 0)
                    {
                        return false;
                    }
                }
                return true;
            }

            private static void Output(XmlReader reader, string name)
            {
                while (reader.Read())
                {
                    // break condition
                    if (reader.NodeType == XmlNodeType.EndElement
                        && String.Compare(reader.Name, name, StringComparison.CurrentCultureIgnoreCase) == 0)
                    {
                        return;
                    }

                    if (reader.NodeType == XmlNodeType.Element)
                    {
                        Console.WriteLine("Element {0}", reader.Name);
                        Console.WriteLine("Attributes");
                        for (int i = 0; i < reader.AttributeCount; i++)
                        {
                            reader.MoveToAttribute(i);
                            Console.WriteLine("Attribute: {0} Value: {1}", reader.Name, reader.Value);
                        }
                    }

                    if (reader.NodeType == XmlNodeType.Text)
                    {
                        Console.WriteLine("Element value: {0}", reader.Value);
                    }
                }
            }

            private void Parse(string query)
            {
                IList<string> elements = query.Split('/');
                foreach (string element in elements)
                {
                    ElementOfInterest interestingElement = null;
                    string elementName = element;
                    int attributeQueryStartIndex = element.IndexOf('[');
                    if (attributeQueryStartIndex != -1)
                    {
                        int attributeQueryEndIndex = element.IndexOf(']');
                        if (attributeQueryEndIndex == -1)
                        {
                            throw new ArgumentException(String.Format("Argument: {0} has a [ without a corresponding ]", query));
                        }
                        elementName = elementName.Substring(0, attributeQueryStartIndex);
                        string attributeQuery = element.Substring(attributeQueryStartIndex + 1,
                                    (attributeQueryEndIndex - attributeQueryStartIndex) - 2);
                        string[] keyValPair = attributeQuery.Split('=');
                        if (keyValPair.Length != 2)
                        {
                            throw new ArgumentException(String.Format("Argument: {0} has an attribute query that either has too many or insufficient = marks. We currently only support one", query));
                        }
                        interestingElement = new ElementOfInterest(elementName);
                        interestingElement.Add(new AttributeOfInterest(keyValPair[0].Trim().Replace("'", ""),
                            keyValPair[1].Trim().Replace("'", "")));
                    }
                    else
                    {
                        interestingElement = new ElementOfInterest(elementName);
                    }

                    list.AddLast(interestingElement);
                }
            }

            class ElementOfInterest
            {
                private readonly string elementName;
                private readonly List<AttributeOfInterest> attributes = new List<AttributeOfInterest>();

                public ElementOfInterest(string elementName)
                {
                    this.elementName = elementName;
                }

                public string ElementName
                {
                    get { return elementName; }
                }

                public List<AttributeOfInterest> Attributes
                {
                    get { return attributes; }
                }

                public void Add(AttributeOfInterest attribute)
                {
                    Attributes.Add(attribute);
                }
            }

            class AttributeOfInterest
            {
                private readonly string name;
                private readonly string value;

                public AttributeOfInterest(string name, string value)
                {
                    this.name = name;
                    this.value = value;
                }

                public string Value
                {
                    get { return value; }
                }

                public string Name
                {
                    get { return name; }
                }
            }
        }
    }
}
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationRelease>
  <Profiles>
    <Profile Name ="MyProfileName">
      <Screens>
        <Screen Id="MyScreenId">
          <Settings>
            <Setting Name="MySettingName">
              <Paydirt>Good stuff</Paydirt>
            </Setting>
          </Settings>
        </Screen>
      </Screens>
    </Profile>
    <Profile Name ="SomeProfile">
      <Screens>
        <Screen Id="MyScreenId">
          <Settings>
            <Setting Name="Boring">
              <Paydirt>NOES you should not find this!!!</Paydirt>
            </Setting>
          </Settings>
        </Screen>
      </Screens>
    </Profile>
    <Profile Name ="SomeProfile">
      <Screens>
        <Screen Id="Boring">
          <Settings>
            <Setting Name="MySettingName">
              <Paydirt>NOES you should not find this!!!</Paydirt>
            </Setting>
          </Settings>
        </Screen>
      </Screens>
    </Profile>
    <Profile Name ="Boring">
      <Screens>
        <Screen Id="MyScreenId">
          <Settings>
            <Setting Name="MySettingName">
              <Paydirt>NOES you should not find this!!!</Paydirt>
            </Setting>
          </Settings>
        </Screen>
      </Screens>
    </Profile>
  </Profiles>
</ConfigurationRelease>
C:\Sandbox\XPathInCE\XPathInCE\bin\Debug>XPathInCE MyXmlFile.xml ConfigurationRe
lease/Profiles/Profile[Name='MyProfileName']/Screens/Screen[Id='MyScreenId']/Set
tings/Setting[Name='MySettingName']
Considering element: ConfigurationRelease
Considering element: Profiles
Considering element: Profile
Considering element: Screens
Considering element: Screen
Considering element: Settings
Considering element: Setting
XPath match found!
Element Paydirt
Attributes
Element value: Good stuff
Considering element: Profile
Skipping
Considering element: Profile
Skipping
Considering element: Profile
Skipping
Press ENTER to exit