C# 从<；中选择href；a>；使用HtmlAgilityPack的节点_C#_Html_Nodes_Html Agility Pack

C# 从<；中选择href；a>；使用HtmlAgilityPack的节点

c# html

C# 从<；中选择href；a>；使用HtmlAgilityPack的节点,c#,html,nodes,html-agility-pack,C#,Html,Nodes,Html Agility Pack,我正在尝试学习webscraping，并使用C#中的Htmlagilitypack从“a”节点获取href值。gridview中有多个GridCell，其中包含包含SmallerCell的文章，我希望所有这些文章都包含“a”节点href值 <div class=Tabpanel> <div class=G ridW> <div class=G ridCell> <article>

我正在尝试学习webscraping，并使用C#中的Htmlagilitypack从“a”节点获取href值。gridview中有多个GridCell，其中包含包含SmallerCell的文章，我希望所有这些文章都包含“a”节点href值

<div class=Tabpanel>
    <div class=G ridW>
        <div class=G ridCell>
            <article>
                <div class=s mallerCell>
                    <a href="..........">
                </div>
            </article>
        </div>
    </div>
    <div class=r andom>
    </div>
    <div class=r andom>
    </div>
</div>

这就是我到目前为止所想到的，感觉我让事情变得更加复杂。我该怎么办？还是有更简单的方法

httpclient = new HttpClient();
var html = await httpclient.GetStringAsync(Url);

var htmldoc = new HtmlDocument();
htmldoc.LoadHtml(html);

var ReceptLista = new List < HtmlNode > ();
ReceptLista = htmldoc.DocumentNode.Descendants("div")
    .Where(node => node.GetAttributeValue("class", "")
        .Equals("GridW")).ToList();

var finalList = new List < HtmlNode > ();
finalList = ReceptLista[0].Descendants("article").ToList();

var finalList2 = new List < List < HtmlNode >> ();
for (int i = 0; i < finalList.Count; i++) {
    finalList2.Add(finalList[i].DescendantNodes().Where(node => node.GetAttributeValue("class", "").Equals("RecipeTeaser-content")).ToList());
}

var finalList3 = new List < List < HtmlNode >> ();

for (int i = 0; i < finalList2.Count; i++) {
    finalList3.Add(finalList2[i].Where(node => node.GetAttributeValue("class", "").Equals("RecipeTeaser-link js-searchRecipeLink")).ToList());
}

httpclient=newhttpclient（）；
var html=await httpclient.GetStringAsync（Url）；
var htmldoc=新的HtmlDocument（）；
htmldoc.LoadHtml（html）；
var ReceptLista=新列表（）；
ReceptLista=htmldoc.DocumentNode.substands（“div”）
.Where（node=>node.GetAttributeValue（“类”，“”）
.Equals（“GridW”））.ToList（）；
var finalList=新列表（）；
finalList=ReceptLista[0]。后代（“文章”）.ToList（）；
var finalList2=新列表>（）；
for（int i=0；inode.GetAttributeValue（“类”），Equals（“RecipeTaser内容”）.ToList（）；
}
var finalList3=新列表>（）；
for（int i=0；inode.GetAttributeValue（“类”），等于（“RecipeTaser链接js searchRecipeLink”））.ToList（）；
}

如果您可以通过使用

XPath

使事情变得更简单

如果您需要

文章

标记中的所有链接，可以执行以下操作

var anchors = htmldoc.SelectNodes("//article/a");
var links = anchors.Select(a=>a.attributes["href"].Value).ToList();

我认为它是

值

。与文档核对

如果您只需要作为文章子级的锚定标记，并且还需要类

smallerCell

，那么可以将xpath更改为

//article/div[@class='smallerClass']/a

你明白了。我认为你只是缺少xpath知识。还要注意的是，HtmlAgilityPack还有可以添加CSS选择器的插件，所以如果您不想使用xpath，这也是一个选项。

最简单的方法是

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(text);
    var nodesWithARef = doc.DocumentNode.Descendants("a");

    foreach (HtmlNode node in nodesWithARef)
    {
        Console.WriteLine(node.GetAttributeValue("href", ""));
    }

推理：使用后代函数将为您提供整个html中您感兴趣的所有链接的数组。您可以浏览节点并执行所需操作。。。我只是简单地打印href

另一种方法是查找所有名为“smallerCell”的类的节点。然后，对于这些节点中的每一个，查找href（如果它在该节点下存在）并打印它（或者使用它做一些事情）

@这能回答你的问题吗？
var nodesWithSmallerCells = doc.DocumentNode.SelectNodes("//div[@class='smallerCell']"); if (nodesWithSmallerCells != null) foreach (HtmlNode node in nodesWithSmallerCells) { HtmlNodeCollection children = node.SelectNodes(".//a"); if (children != null) foreach (HtmlNode child in children) Console.WriteLine(child.GetAttributeValue("href", "")); }