C# 大型XML文件,XmlDocument不可行,但需要能够搜索
我正在努力解决一个合理的逻辑循环,从一个XML文件中剥离节点,这个XML文件太大,无法与支持XPath的.NET类一起使用 我正试图用使用XmTextReader的代码替换我的一行代码(用XPath查询字符串调用SelectNodes) 正如前面使用的XPath查询(仅供参考)所示,我必须降低几个级别: 我觉得这很烦人,但很简单。然而,我似乎无法正确地进行循环 我需要获取一个节点,检查该节点下的一个节点,查看该值是否与目标字符串匹配,如果匹配,则进一步向下遍历,如果不匹配,则跳过该分支 事实上,我认为我的问题是,如果我不熟悉某个分支,我不知道如何忽略它。我不能允许它遍历不相关的分支,因为元素名称不是唯一的(如XPath查询所示) 我想我可以维护一些布尔值,例如bool expectingProfileName,当我点击一个Profile节点时,它被设置为true。但是,如果它不是我想要的特定概要文件节点,我就无法离开该分支 所以…希望这对某人来说是有意义的…我已经盯着这个问题看了几个小时,可能只是错过了一些明显的东西 我想把文件的一部分贴上去,但我不知道它的结构大致是怎样的:C# 大型XML文件,XmlDocument不可行,但需要能够搜索,c#,xml,mobile,compact-framework,xmltextreader,C#,Xml,Mobile,Compact Framework,Xmltextreader,我正在努力解决一个合理的逻辑循环,从一个XML文件中剥离节点,这个XML文件太大,无法与支持XPath的.NET类一起使用 我正试图用使用XmTextReader的代码替换我的一行代码(用XPath查询字符串调用SelectNodes) 正如前面使用的XPath查询(仅供参考)所示,我必须降低几个级别: 我觉得这很烦人,但很简单。然而,我似乎无法正确地进行循环 我需要获取一个节点,检查该节点下的一个节点,查看该值是否与目标字符串匹配,如果匹配,则进一步向下遍历,如果不匹配,则跳过该分支 事实上,
ConfigRelease > Profiles > Profile > Name > Screens > Screen > Settings > Setting > Name
我将知道ProfileName、ScreenName和SettingName,我需要设置节点
我试图避免在一次点击中读取整个文件,例如在应用程序启动时,因为其中一半的内容永远不会被使用。我也无法控制生成xml文件的内容,因此无法将其更改为生成多个较小的文件
任何提示都将不胜感激
更新
我重新打开了这个。一张海报建议使用本应完美的XPathDocument。不幸的是,我没有提到这是一个移动应用程序,不支持XPathDocument
按照大多数标准,该文件并不大,这就是系统最初被编码为使用XmlDocument的原因。它目前的容量为4MB,显然大到足以在移动应用程序加载到XmlDocument时崩溃。它可能只是因为文件被执行以变得更大而出现。不管怎么说,我现在正在尝试数据集的建议,但仍然对其他想法持开放态度
更新2
我有点怀疑,因为很多人都说他们不希望这么大的文件会使系统崩溃。进一步的实验表明,这是一次间歇性碰撞。昨天它每次都死机,但今天早上在我重置设备后,我无法复制它。我现在正试图找出一套可靠的繁殖步骤。还要决定处理这个问题的最佳方法,我相信这个问题仍然存在。我不能就这样离开它,因为如果应用程序无法访问此文件,它将毫无用处,而且我认为当我的应用程序正在运行时,我无法告诉我的用户他们无法在设备上运行任何其他内容……尝试将文件加载到数据集中:
DataSet ds = new Dataset();
ds.ReadXml("C:\MyXmlFile.xml")
然后您可以使用linq来搜索它。看看
XPathDocument比XmlDocument更轻,并且针对只读XPath查询进行了优化。好的,我对此很感兴趣,所以我一起编写了一些代码。它并不漂亮,只真正支持这一个用例,但我认为它完成了您正在寻找的工作,并作为一个体面的平台开始。我也没有彻底测试过。 最后,您需要修改代码以使其返回内容(请参阅名为Output()的方法) 代码如下:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;
namespace XPathInCE
{
class Program
{
static void Main(string[] args)
{
try
{
if (args.Length != 2)
{
ShowUsage();
}
else
{
Extract(args[0], args[1]);
}
}
catch (Exception ex)
{
Console.WriteLine("{0} was thrown", ex.GetType());
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}
private static void Extract(string filePath, string queryString)
{
if (!File.Exists(filePath))
{
Console.WriteLine("File not found! Path: {0}", filePath);
return;
}
XmlReaderSettings settings = new XmlReaderSettings { IgnoreComments = true, IgnoreWhitespace = true };
using (XmlReader reader = XmlReader.Create(filePath, settings))
{
XPathQuery query = new XPathQuery(queryString);
query.Find(reader);
}
}
static void ShowUsage()
{
Console.WriteLine("No file specified or incorrect number of parameters");
Console.WriteLine("Args must be: Filename XPath");
Console.WriteLine();
Console.WriteLine("Sample usage:");
Console.WriteLine("XPathInCE someXmlFile.xml ConfigurationRelease/Profiles/Profile[Name='MyProfileName']/Screens/Screen[Id='MyScreenId']/Settings/Setting[Name='MySettingName']");
}
class XPathQuery
{
private readonly LinkedList<ElementOfInterest> list = new LinkedList<ElementOfInterest>();
private LinkedListNode<ElementOfInterest> currentNode;
internal XPathQuery(string query)
{
Parse(query);
currentNode = list.First;
}
internal void Find(XmlReader reader)
{
bool skip = false;
while (true)
{
if (skip)
{
reader.Skip();
skip = false;
}
else
{
if (!reader.Read())
{
break;
}
}
if (reader.NodeType == XmlNodeType.EndElement
&& String.Compare(reader.Name, currentNode.Previous.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) == 0)
{
currentNode = currentNode.Previous ?? currentNode;
continue;
}
if (reader.NodeType == XmlNodeType.Element)
{
string currentElementName = reader.Name;
Console.WriteLine("Considering element: {0}", reader.Name);
if (String.Compare(reader.Name, currentNode.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) != 0)
{
// don't want
Console.WriteLine("Skipping");
skip = true;
continue;
}
if (!FindAttributes(reader))
{
// don't want
Console.WriteLine("Skipping");
skip = true;
continue;
}
// is there more?
if (currentNode.Next != null)
{
currentNode = currentNode.Next;
continue;
}
// we're at the end, this is a match! :D
Console.WriteLine("XPath match found!");
Output(reader, currentElementName);
}
}
}
private bool FindAttributes(XmlReader reader)
{
foreach (AttributeOfInterest attributeOfInterest in currentNode.Value.Attributes)
{
if (String.Compare(reader.GetAttribute(attributeOfInterest.Name), attributeOfInterest.Value,
StringComparison.CurrentCultureIgnoreCase) != 0)
{
return false;
}
}
return true;
}
private static void Output(XmlReader reader, string name)
{
while (reader.Read())
{
// break condition
if (reader.NodeType == XmlNodeType.EndElement
&& String.Compare(reader.Name, name, StringComparison.CurrentCultureIgnoreCase) == 0)
{
return;
}
if (reader.NodeType == XmlNodeType.Element)
{
Console.WriteLine("Element {0}", reader.Name);
Console.WriteLine("Attributes");
for (int i = 0; i < reader.AttributeCount; i++)
{
reader.MoveToAttribute(i);
Console.WriteLine("Attribute: {0} Value: {1}", reader.Name, reader.Value);
}
}
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Element value: {0}", reader.Value);
}
}
}
private void Parse(string query)
{
IList<string> elements = query.Split('/');
foreach (string element in elements)
{
ElementOfInterest interestingElement = null;
string elementName = element;
int attributeQueryStartIndex = element.IndexOf('[');
if (attributeQueryStartIndex != -1)
{
int attributeQueryEndIndex = element.IndexOf(']');
if (attributeQueryEndIndex == -1)
{
throw new ArgumentException(String.Format("Argument: {0} has a [ without a corresponding ]", query));
}
elementName = elementName.Substring(0, attributeQueryStartIndex);
string attributeQuery = element.Substring(attributeQueryStartIndex + 1,
(attributeQueryEndIndex - attributeQueryStartIndex) - 2);
string[] keyValPair = attributeQuery.Split('=');
if (keyValPair.Length != 2)
{
throw new ArgumentException(String.Format("Argument: {0} has an attribute query that either has too many or insufficient = marks. We currently only support one", query));
}
interestingElement = new ElementOfInterest(elementName);
interestingElement.Add(new AttributeOfInterest(keyValPair[0].Trim().Replace("'", ""),
keyValPair[1].Trim().Replace("'", "")));
}
else
{
interestingElement = new ElementOfInterest(elementName);
}
list.AddLast(interestingElement);
}
}
class ElementOfInterest
{
private readonly string elementName;
private readonly List<AttributeOfInterest> attributes = new List<AttributeOfInterest>();
public ElementOfInterest(string elementName)
{
this.elementName = elementName;
}
public string ElementName
{
get { return elementName; }
}
public List<AttributeOfInterest> Attributes
{
get { return attributes; }
}
public void Add(AttributeOfInterest attribute)
{
Attributes.Add(attribute);
}
}
class AttributeOfInterest
{
private readonly string name;
private readonly string value;
public AttributeOfInterest(string name, string value)
{
this.name = name;
this.value = value;
}
public string Value
{
get { return value; }
}
public string Name
{
get { return name; }
}
}
}
}
}
我在桌面上运行它,但它是我生成的CF 2.00.exe,所以它在CE上应该可以正常工作。
正如您所看到的,它在不匹配时会跳过,因此不会遍历整个文件
任何人的反馈都是值得赞赏的,特别是如果人们有使代码更简洁的指针。我添加这一点,因为问题现在已经解决了,但所选的解决方案与目前列出的任何内容都不匹配 我们的技术架构师处理了这个问题,并决定我们根本不应该实现Xml。这一决定部分是由于这个问题,但也由于一些关于数据传输费用水平的投诉 他的结论是,我们应该实现一种定制的文件格式(带索引),并针对查询的大小和速度进行优化 因此,在该工作得到批准和适当规范之前,该问题将被搁置
现在到此为止。将其加载到数据集中是行不通的-这将占用更多内存 当遇到类似的问题时,我使用XmlReader并在加载时构建内存索引。我给出了索引,然后当用户单击链接或激活搜索时,我再次使用XmlReader重新读取XML文档,并加载相应的子集
这听起来很费劲,我想在某些方面确实如此。它用CPU周期换取内存。但它是有效的,而且应用程序的响应速度足够快。数据大小只有2mb,没有那么大。但我用数据集得到了OOM。然后我转到XmlSerializer,它工作了一段时间,但我再次遇到了一个OOM。所以我最终回到了这个自定义索引的事情上 您可以实现一个基于sax的解析器,这样在解析XML时,您只需要使用感兴趣的分支。这是最好的方法,因为它不会将整个xml作为文档加载 最理想的情况是,您可以根据需要设计自定义解析器,并在一次传递中对所有内容执行所有解析。例如,如果您以后对特定节点感兴趣,请保存对它们的引用,以便以后可以从那里开始,而不是重新进行解析或遍历 这里的缺点是它有点自定义编程 好处是你只会阅读你感兴趣的东西
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Xml;
namespace XPathInCE
{
class Program
{
static void Main(string[] args)
{
try
{
if (args.Length != 2)
{
ShowUsage();
}
else
{
Extract(args[0], args[1]);
}
}
catch (Exception ex)
{
Console.WriteLine("{0} was thrown", ex.GetType());
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}
private static void Extract(string filePath, string queryString)
{
if (!File.Exists(filePath))
{
Console.WriteLine("File not found! Path: {0}", filePath);
return;
}
XmlReaderSettings settings = new XmlReaderSettings { IgnoreComments = true, IgnoreWhitespace = true };
using (XmlReader reader = XmlReader.Create(filePath, settings))
{
XPathQuery query = new XPathQuery(queryString);
query.Find(reader);
}
}
static void ShowUsage()
{
Console.WriteLine("No file specified or incorrect number of parameters");
Console.WriteLine("Args must be: Filename XPath");
Console.WriteLine();
Console.WriteLine("Sample usage:");
Console.WriteLine("XPathInCE someXmlFile.xml ConfigurationRelease/Profiles/Profile[Name='MyProfileName']/Screens/Screen[Id='MyScreenId']/Settings/Setting[Name='MySettingName']");
}
class XPathQuery
{
private readonly LinkedList<ElementOfInterest> list = new LinkedList<ElementOfInterest>();
private LinkedListNode<ElementOfInterest> currentNode;
internal XPathQuery(string query)
{
Parse(query);
currentNode = list.First;
}
internal void Find(XmlReader reader)
{
bool skip = false;
while (true)
{
if (skip)
{
reader.Skip();
skip = false;
}
else
{
if (!reader.Read())
{
break;
}
}
if (reader.NodeType == XmlNodeType.EndElement
&& String.Compare(reader.Name, currentNode.Previous.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) == 0)
{
currentNode = currentNode.Previous ?? currentNode;
continue;
}
if (reader.NodeType == XmlNodeType.Element)
{
string currentElementName = reader.Name;
Console.WriteLine("Considering element: {0}", reader.Name);
if (String.Compare(reader.Name, currentNode.Value.ElementName, StringComparison.CurrentCultureIgnoreCase) != 0)
{
// don't want
Console.WriteLine("Skipping");
skip = true;
continue;
}
if (!FindAttributes(reader))
{
// don't want
Console.WriteLine("Skipping");
skip = true;
continue;
}
// is there more?
if (currentNode.Next != null)
{
currentNode = currentNode.Next;
continue;
}
// we're at the end, this is a match! :D
Console.WriteLine("XPath match found!");
Output(reader, currentElementName);
}
}
}
private bool FindAttributes(XmlReader reader)
{
foreach (AttributeOfInterest attributeOfInterest in currentNode.Value.Attributes)
{
if (String.Compare(reader.GetAttribute(attributeOfInterest.Name), attributeOfInterest.Value,
StringComparison.CurrentCultureIgnoreCase) != 0)
{
return false;
}
}
return true;
}
private static void Output(XmlReader reader, string name)
{
while (reader.Read())
{
// break condition
if (reader.NodeType == XmlNodeType.EndElement
&& String.Compare(reader.Name, name, StringComparison.CurrentCultureIgnoreCase) == 0)
{
return;
}
if (reader.NodeType == XmlNodeType.Element)
{
Console.WriteLine("Element {0}", reader.Name);
Console.WriteLine("Attributes");
for (int i = 0; i < reader.AttributeCount; i++)
{
reader.MoveToAttribute(i);
Console.WriteLine("Attribute: {0} Value: {1}", reader.Name, reader.Value);
}
}
if (reader.NodeType == XmlNodeType.Text)
{
Console.WriteLine("Element value: {0}", reader.Value);
}
}
}
private void Parse(string query)
{
IList<string> elements = query.Split('/');
foreach (string element in elements)
{
ElementOfInterest interestingElement = null;
string elementName = element;
int attributeQueryStartIndex = element.IndexOf('[');
if (attributeQueryStartIndex != -1)
{
int attributeQueryEndIndex = element.IndexOf(']');
if (attributeQueryEndIndex == -1)
{
throw new ArgumentException(String.Format("Argument: {0} has a [ without a corresponding ]", query));
}
elementName = elementName.Substring(0, attributeQueryStartIndex);
string attributeQuery = element.Substring(attributeQueryStartIndex + 1,
(attributeQueryEndIndex - attributeQueryStartIndex) - 2);
string[] keyValPair = attributeQuery.Split('=');
if (keyValPair.Length != 2)
{
throw new ArgumentException(String.Format("Argument: {0} has an attribute query that either has too many or insufficient = marks. We currently only support one", query));
}
interestingElement = new ElementOfInterest(elementName);
interestingElement.Add(new AttributeOfInterest(keyValPair[0].Trim().Replace("'", ""),
keyValPair[1].Trim().Replace("'", "")));
}
else
{
interestingElement = new ElementOfInterest(elementName);
}
list.AddLast(interestingElement);
}
}
class ElementOfInterest
{
private readonly string elementName;
private readonly List<AttributeOfInterest> attributes = new List<AttributeOfInterest>();
public ElementOfInterest(string elementName)
{
this.elementName = elementName;
}
public string ElementName
{
get { return elementName; }
}
public List<AttributeOfInterest> Attributes
{
get { return attributes; }
}
public void Add(AttributeOfInterest attribute)
{
Attributes.Add(attribute);
}
}
class AttributeOfInterest
{
private readonly string name;
private readonly string value;
public AttributeOfInterest(string name, string value)
{
this.name = name;
this.value = value;
}
public string Value
{
get { return value; }
}
public string Name
{
get { return name; }
}
}
}
}
}
<?xml version="1.0" encoding="utf-8" ?>
<ConfigurationRelease>
<Profiles>
<Profile Name ="MyProfileName">
<Screens>
<Screen Id="MyScreenId">
<Settings>
<Setting Name="MySettingName">
<Paydirt>Good stuff</Paydirt>
</Setting>
</Settings>
</Screen>
</Screens>
</Profile>
<Profile Name ="SomeProfile">
<Screens>
<Screen Id="MyScreenId">
<Settings>
<Setting Name="Boring">
<Paydirt>NOES you should not find this!!!</Paydirt>
</Setting>
</Settings>
</Screen>
</Screens>
</Profile>
<Profile Name ="SomeProfile">
<Screens>
<Screen Id="Boring">
<Settings>
<Setting Name="MySettingName">
<Paydirt>NOES you should not find this!!!</Paydirt>
</Setting>
</Settings>
</Screen>
</Screens>
</Profile>
<Profile Name ="Boring">
<Screens>
<Screen Id="MyScreenId">
<Settings>
<Setting Name="MySettingName">
<Paydirt>NOES you should not find this!!!</Paydirt>
</Setting>
</Settings>
</Screen>
</Screens>
</Profile>
</Profiles>
</ConfigurationRelease>
C:\Sandbox\XPathInCE\XPathInCE\bin\Debug>XPathInCE MyXmlFile.xml ConfigurationRe
lease/Profiles/Profile[Name='MyProfileName']/Screens/Screen[Id='MyScreenId']/Set
tings/Setting[Name='MySettingName']
Considering element: ConfigurationRelease
Considering element: Profiles
Considering element: Profile
Considering element: Screens
Considering element: Screen
Considering element: Settings
Considering element: Setting
XPath match found!
Element Paydirt
Attributes
Element value: Good stuff
Considering element: Profile
Skipping
Considering element: Profile
Skipping
Considering element: Profile
Skipping
Press ENTER to exit