C# HtmlAgilityPack能否处理xsl文件附带的xml文件来呈现html?

C# HtmlAgilityPack能否处理xsl文件附带的xml文件来呈现html?,c#,html-agility-pack,C#,Html Agility Pack,我想知道HtmlAgilityPack读取包含xsl文件以呈现html的xml文件的最佳方法。HtmlDocument类上是否有任何设置可以帮助实现这一点,或者在使用HtmlAliyPack加载转换之前,我是否必须找到执行转换的方法?如果对后者是肯定的,有谁知道这样一个转换的好库或方法?下面是一个网站的示例,它返回带有xls文件的xml以及我想使用的代码 var uri = new Uri("http://www.skechers.com/"); var request = (HttpWebRe

我想知道HtmlAgilityPack读取包含xsl文件以呈现html的xml文件的最佳方法。HtmlDocument类上是否有任何设置可以帮助实现这一点,或者在使用HtmlAliyPack加载转换之前,我是否必须找到执行转换的方法?如果对后者是肯定的,有谁知道这样一个转换的好库或方法?下面是一个网站的示例,它返回带有xls文件的xml以及我想使用的代码

var uri = new Uri("http://www.skechers.com/");
var request = (HttpWebRequest)WebRequest.Create(url);
var cookieContainer = new CookieContainer();

request.CookieContainer = cookieContainer;
request.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
request.Method = "GET";
request.AllowAutoRedirect = true;
request.Timeout = 15000;

var response = (HttpWebResponse)request.GetResponse();
var page = new HtmlDocument();
page.OptionReadEncoding = false;
var stream = response.GetResponseStream();
page.Load(stream); 

这段代码不会抛出任何错误,但xml是解析的内容,而不是转换,这正是我想要的。

您应该呈现xml和XSLT的输出。要做到这一点,您需要下载XML,而且您已经下载了。接下来解析XML以标识XSL引用。然后需要下载XSL并将其应用于XML文档

这些链接可能有用


    • 这是我收到响应后使用的附加代码。请注意,只有当响应为“application/xml”时,这才是好的,并且您必须在整个过程中检查对象的空实例。此外,FormAssetSrc是一个私有函数,它接受href的值,并确定它是协议、根还是文档相关的,并创建完全限定的uri

      var xmlStream = response.GetResponseStream();
      var xmlDocument = new XPathDocument(xmlStream);
      var styleNode = xmlDocument.CreateNavigator().SelectSingleNode("processing-instruction('xml-stylesheet')");
      var hrefValue = Regex.Match((styleNode).Value, "href=(\"|')(?<url>.*?)(\"|')");
      if(hrefValue.Success)
      {
          var xslHref = FormAssetSrc(hrefValue.Groups["url"].Value, response.ResponseUri);
          var xslUri = new Uri(xslHref);
          var xslRequest = CreateWebRequest(xslUri);
          var xslResponse = (HttpWebResponse)xslRequest.GetResponse();
          var xslStream = new XPathDocument(xslResponse.GetResponseStream());
          var xslTransorm = new XslTransform();
          var sw = new System.IO.StringWriter();
          xslTransorm.Load(xslStream);
          xslTransorm.Transform(xmlDocument.CreateNavigator(), null, sw);
          page.Html.LoadHtml(sw.ToString());
      }
      
      var xmlStream=response.GetResponseStream();
      var xmlDocument=新的XPathDocument(xmlStream);
      var styleNode=xmlDocument.CreateNavigator().SelectSingleNode(“处理指令('xml-stylesheet')”);
      var hrefValue=Regex.Match((styleNode.Value),“href=(\“|”)(?*(\“|”)”;
      if(hrefValue.Success)
      {
      var xslHref=FormAssetSrc(hrefValue.Groups[“url”].Value,response.ResponseUri);
      var xslUri=新Uri(xslHref);
      var xslRequest=CreateWebRequest(xslUri);
      var xslResponse=(HttpWebResponse)xslRequest.GetResponse();
      var xslStream=新的XPathDocument(xslResponse.GetResponseStream());
      var xslTransorm=new XslTransform();
      var sw=new System.IO.StringWriter();
      xsltranform.Load(xslStream);
      xslTransorm.Transform(xmlDocument.CreateNavigator(),null,sw);
      page.Html.LoadHtml(sw.ToString());
      }
      
      Html Agility Pack可以在以下两点上帮助您:

      1) 使用它更容易获得Xml处理指令,因为它将PI数据解析为Html,因此它会将其转换为属性

      2) HtmlDocument实现了IXPathNavigable,因此它可以直接由.NET Xslt转换引擎进行转换

      下面是一段有效的代码。我必须添加一个特定的XmlResover来正确处理Xslt转换,但我认为这是特定于skechers的情况

      public static void DownloadAndProcessXml(string url, string userAgent, string outputFilePath)
      {
          using (XmlTextWriter writer = new XmlTextWriter(outputFilePath, Encoding.UTF8))
          {
              DownloadAndProcessXml(url, userAgent, writer);
          }
      }
      
      public static void DownloadAndProcessXml(string url, string userAgent, XmlWriter output)
      {
          UserAgentXmlUrlResolver resolver = new UserAgentXmlUrlResolver(url, userAgent);
      
          // WebClient is an easy to use class.
          using (WebClient client = new WebClient())
          {
              // download Xml doc. set User-Agent header or the site won't answer us...
              client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
              HtmlDocument xmlDoc = new HtmlDocument();
              xmlDoc.Load(client.OpenRead(url));
      
              // determine xslt (note the xpath trick as Html Agility Pack does not support xml processing instructions)
              string xsltUrl = xmlDoc.DocumentNode.SelectSingleNode("//*[name()='?xml-stylesheet']").GetAttributeValue("href", null);
      
              // download Xslt doc
              client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
              XslCompiledTransform xslt = new XslCompiledTransform();
              xslt.Load(new XmlTextReader(client.OpenRead(url + xsltUrl)), new XsltSettings(true, false), null);
      
              // transform Html/Xml doc into new Xml doc, easy as HtmlDocument implements IXPathNavigable
              // note the use of a custom resolver to overcome this Xslt resolve requests
              xslt.Transform(xmlDoc, null, output, resolver);
          }
      }
      
      // This class is needed during transformation otherwise there are errors.
      // This is probably due to this very specific Xslt file that needs to go back to the root document itself.
      public class UserAgentXmlUrlResolver : XmlUrlResolver
      {
          public UserAgentXmlUrlResolver(string rootUrl, string userAgent)
          {
              RootUrl = rootUrl;
              UserAgent = userAgent;
          }
      
          public string RootUrl { get; set; }
          public string UserAgent { get; set; }
      
          public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
          {
              WebClient client = new WebClient();
              if (!string.IsNullOrEmpty(UserAgent))
              {
                  client.Headers[HttpRequestHeader.UserAgent] = UserAgent;
              }
              return client.OpenRead(absoluteUri);
          }
      
          public override Uri ResolveUri(Uri baseUri, string relativeUri)
          {
              if ((relativeUri == "/") && (!string.IsNullOrEmpty(RootUrl)))
                  return new Uri(RootUrl);
      
              return base.ResolveUri(baseUri, relativeUri);
          }
      }
      
      你这样称呼它:

          string url = "http://www.skechers.com/";
          string ua = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
          DownloadAndProcessXml(url, ua, "skechers.html");
      

      如果您有格式良好的XML,为什么要使用HtmlAgilityPack呢?我正在尝试获取页面摘要,即页面标题和元描述,以及页面上的img SRC列表。我允许从web输入任何有效的url。因此,为了回答您的问题,我并不总是有格式良好的xml,即使我有,文档标题和描述的格式也会不一致。CreateWebRequest也是一个私有函数,它可以创建一个请求,就像原始问题的第一个代码片段中的请求谢谢,我最终这样做了,但是没有将此标记为答案,因为它没有实现。再次感谢,我认为出于我的目的,我拥有的代码将工作得更好。我想,对于如何实现这一点的一般指南,我会推荐您的代码。顺便说一句,HtmlAgilityPack非常棒。我还想补充一点,能够将html字符串传递到HtmlDocument.Load方法中,而不必手动创建流,这将非常酷。我确实看到它已经有12个重载了@Adrian Adkison-有一个LoadHtml重载用于此目的。