不使用WebBrowser或HAP将字符串或html文件转换为C#HtmlDocument_C#_Browser_Dom

不使用WebBrowser或HAP将字符串或html文件转换为C#HtmlDocument

c# browser dom

不使用WebBrowser或HAP将字符串或html文件转换为C#HtmlDocument,c#,browser,dom,C#,Browser,Dom,我能找到的唯一解决方案是使用： mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument(); htmldocu .createDocumentFromUrl(url, ""); 我不确定它的性能，它应该比在WebBrowser中加载html文件然后从那里获取HtmlDocument要好。不管怎样，那个代码在我的机器上不起作用。应用程序在尝试执行第二行时崩溃有没有人能有效地实现这一目标或以其他

我能找到的唯一解决方案是使用：

            mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument();
            htmldocu .createDocumentFromUrl(url, "");

我不确定它的性能，它应该比在WebBrowser中加载html文件然后从那里获取HtmlDocument要好。不管怎样，那个代码在我的机器上不起作用。应用程序在尝试执行第二行时崩溃

有没有人能有效地实现这一目标或以其他方式实现这一目标

注意：请理解我需要用于DOM处理的HtmlDocument对象。我不需要html字符串。

使用

WebClient

对象的

DownloadString

方法。e、 g

WebClient client = new WebClient();
string reply = client.DownloadString("http://www.google.com");

在上面的示例中，执行后，

reply

将包含端点的html标记

http://www.google.com

为了回答您四年前的实际问题（在我发布此答案时），我提供了一个有效的解决方案。如果你找到了另一种方法，我也不会感到惊讶，所以这主要是为其他寻求类似解决方案的人准备的。但是，请记住，这是经过考虑的

有些过时（实际使用

HtmlDocument

）

不是处理HTMLDOM解析的最佳方法（首选解决方案是使用HtmlAgilityPack或CsQuery，或使用实际解析而不是正则表达式的其他方法）

非常黑客，因此不是最安全/最兼容的方式

你真的不应该做我要展示的事

此外，请记住，

HtmlDocument

实际上只是

mshtml.HTMLDocument2

的一个包装器，因此它在技术上比直接使用COM包装器慢，但我完全理解这个用例只是为了便于编码

如果你对以上这些都很冷静，下面是如何实现你想要的

public class HtmlDocumentFactory
{
  private static Type htmlDocType = typeof(System.Windows.Forms.HtmlDocument);
  private static Type htmlShimManagerType = null;
  private static object htmlShimSingleton = null;
  private static ConstructorInfo docCtor = null;

  public static HtmlDocument Create()
  {
    if (htmlShimManagerType == null)
    {
      // get a type reference to HtmlShimManager
      htmlShimManagerType = htmlDocType.Assembly.GetType(
        "System.Windows.Forms.HtmlShimManager"
        );
      // locate the necessary private constructor for HtmlShimManager
      var shimCtor = htmlShimManagerType.GetConstructor(
        BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[0], null
        );
      // create a new HtmlShimManager object and keep it for the rest of the
      // assembly instance
      htmlShimSingleton = shimCtor.Invoke(null);
    }

    if (docCtor == null)
    {
      // get the only constructor for HtmlDocument (which is marked as private)
      docCtor = htmlDocType.GetConstructors(
        BindingFlags.NonPublic | BindingFlags.Instance
        )[0];
    }

    // create an instance of mshtml.HTMLDocument2 (in the form of 
    // IHTMLDocument2 using HTMLDocument2's class ID)
    object htmlDoc2Inst = Activator.CreateInstance(Type.GetTypeFromCLSID(
      new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")
      ));
    var argValues = new object[] { htmlShimSingleton, htmlDoc2Inst };
    // create a new HtmlDocument without involving WebBrowser
    return (HtmlDocument)docCtor.Invoke(argValues);
  }
}

要使用它：

var htmlDoc = HtmlDocumentFactory.Create();
htmlDoc.Write("<html><body><div>Hello, world!</body></div></html>");
Console.WriteLine(htmlDoc.Body.InnerText);
// output:
// Hello, world!

var htmlDoc=HtmlDocumentFactory.Create（）；
写（“你好，世界！”）；
Console.WriteLine（htmlDoc.Body.InnerText）；
//输出：
//你好，世界！

我没有直接测试这段代码——我是从一个旧的Powershell脚本翻译过来的，它需要与您请求的相同的功能。如果失败了，请告诉我。功能已经存在，但代码可能需要非常小的调整才能开始工作。

想法是获取用于DOM解析的HtmlDocument对象，而不是html字符串。Webclient将只返回html字符串，而不返回HtmlDocument。您找到解决方案了吗？