C# XmlDocument无法加载XHTML字符串,因为出现错误“0”;对未声明实体的引用';nbsp'&引用;
我使用以下代码将HTTP响应流转换为XmlDocumentC# XmlDocument无法加载XHTML字符串,因为出现错误“0”;对未声明实体的引用';nbsp'&引用;,c#,xml,C#,Xml,我使用以下代码将HTTP响应流转换为XmlDocument HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; HttpWebResponse response = request.GetResponse() as HttpWebResponse; Stream responseStream = response.GetResponseStream(); StreamReader responseReader =
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream responseStream = response.GetResponseStream();
StreamReader responseReader = new StreamReader(responseStream);
String responseString = responseReader.ReadToEnd();
Console.WriteLine(responseString);
Int32 htmlTagIndex = responseString.IndexOf("<html",
StringComparison.OrdinalIgnoreCase);
XmlDocument responseXhtml = new XmlDocument();
responseString = responseString.Substring(htmlTagIndex); // MARK 1
responseString = responseString.Replace(" ", " "); // MARK 2
responseXhtml.LoadXml(responseString);
return responseXhtml;
HttpWebRequest-request=WebRequest.Create(url)为HttpWebRequest;
HttpWebResponse=request.GetResponse()作为HttpWebResponse;
Stream responseStream=response.GetResponseStream();
StreamReader responseReader=新的StreamReader(responseStream);
字符串responseString=responseReader.ReadToEnd();
控制台。写入线(响应线);
Int32 htmlTagIndex=responseString.IndexOf(“我将直接用于解析html。即使您必须将html转换为xml,也可以使用它
using (WebClient wc = new WebClient())
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(wc.DownloadString("http://www.google.com"));
doc.OptionOutputAsXml = true;
StringWriter writer = new StringWriter();
doc.Save(writer);
var xDoc = XDocument.Load(new StringReader(writer.ToString()));
}
HTML Agility Pack:谢谢,但是HTML Agility Pack似乎是一个过度杀伤力。任何简单的代码?(X)HTML通常不是XML。
是用HTML定义的实体。你真的需要将其加载为XML吗?嗯,很好,也许我不需要将其转换为XML。