Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/linq/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# 在使用HtmlAlityPack进行屏幕抓取时,是否可以同时搜索多个标记类型?_C#_Linq_Screen Scraping_Html Agility Pack_Linq To Objects - Fatal编程技术网

C# 在使用HtmlAlityPack进行屏幕抓取时,是否可以同时搜索多个标记类型?

C# 在使用HtmlAlityPack进行屏幕抓取时,是否可以同时搜索多个标记类型?,c#,linq,screen-scraping,html-agility-pack,linq-to-objects,C#,Linq,Screen Scraping,Html Agility Pack,Linq To Objects,尽管仍处于可延展状态,但此代码仍然有效: public List<string> GetParagraphsListFromHtml(string sourceHtml) { var pars = new List<string>(); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(sourceHtml); var g

尽管仍处于可延展状态,但此代码仍然有效:

public List<string> GetParagraphsListFromHtml(string sourceHtml)
{
    var pars = new List<string>();
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(sourceHtml);

    var getHtmlWeb = new HtmlWeb();
    var document = getHtmlWeb.Load("http://www.montereycountyweekly.com/opinion/letters/article_e333a222-942d-11e3-ba9c-001a4bcf6878.html"); 
    var pTags = document.DocumentNode.SelectNodes("//p");
    int counter = 1;
    if (pTags != null)
    {
        foreach (var pTag in pTags)
        {
            pars.Add(pTag.InnerText);
            MessageBox.Show(pTag.InnerText);
            counter++;
        }
    }
    MessageBox.Show("done!");
    return pars;
}
…或LINQified版本,例如:

    foreach (var par in doc.DocumentNode
        .DescendantNodes()
        .Single(x => x.Id == "body")
        .DescendantNodes()
        .Where(x => x.Name == "h1" || x.Name == "h2" || x.Name == "h3" || x.Name == "hp" || ))

我认为这可能适合您:

doc.DocumentNode.ChildNodes.Where(x => (x.NodeType == HtmlNodeType.Text));

这将捕获所有文本元素。

不幸的是,没有,在上面显示的示例页面中,消息框会显示几次空字符串(无),仅此而已。
doc.DocumentNode.ChildNodes.Where(x => (x.NodeType == HtmlNodeType.Text));