Vb.net 获取包含“/product/”的所有链接_Vb.net_Web Scraping_Web Crawler_Html Agility Pack

Vb.net 获取包含“/product/”的所有链接

vb.net web-scraping web-crawler

Vb.net 获取包含“/product/”的所有链接,vb.net,web-scraping,web-crawler,html-agility-pack,Vb.net,Web Scraping,Web Crawler,Html Agility Pack,我想获得所有包含/product/的链接。有17个链接包含/product/。怎么做这条线好像有问题 Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]") Select iframeNode.Attributes("href").Value 如何按/product/向过滤器添加参数以下是我到目前为止的情况： Imports HtmlAgility

我想获得所有包含/product/的链接。有17个链接包含/product/。怎么做

这条线好像有问题

Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
                       Select iframeNode.Attributes("href").Value

如何按/product/向过滤器添加参数

以下是我到目前为止的情况：

Imports HtmlAgilityPack

Module Module1

    Sub Main()
        Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
        Dim htmlDoc As New HtmlAgilityPack.HtmlDocument

        htmlDoc.LoadHtml(mainUrl)

        Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
                   Select iframeNode.Attributes("href").Value

        'print all the src you got
        For Each src In srcs
            Console.WriteLine(src)
        Next
    End Sub

End Module

编辑：

工作解决方案：

    Imports HtmlAgilityPack

    Module Module1

        Sub Main()
            Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
            Dim htmlDoc As HtmlDocument = New HtmlWeb().Load(mainUrl) '< - - - Load the webage into htmldocument

            Dim srcs As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//ul[@class='products-list-page']//a") '< - - - select nodes with links
            For Each src As HtmlNode In srcs
                Console.WriteLine(src.Attributes("href").Value) '< - - - Print urls

            Next

                Console.Read()

        End Sub

    End Module

您必须先加载网页，然后选择要打印的节点和属性

这里有一个方法：

    Dim mainUrl As String = "https://www.nordicwater.com/products/waste-water/"
    Dim htmlDoc As HtmlDocument = New HtmlWeb().Load(mainUrl) '< - - - Load the webage into htmldocument

    Dim srcs As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//ul[@class='products-list-page']//a") '< - - - select nodes with links
    For Each src As HtmlNode In srcs
        Console.WriteLine(src.Attributes("href").Value) '< - - - Print urls
    Next

您需要学习调试，如果您检查了代码，您会看到您将htmlDoc html设置为url字符串，而不是加载实际的网页html。

谢谢！我在Console.WriteLine之后添加了Console.Read，因此我只看到一个链接。第一个。有一个循环，但为什么它不起作用？您需要指定错误的位置更多，因为这对我来说很好，请编辑您的问题。在控制台中运行代码后，我得到的结果是将您现在拥有的所有代码都发布，然后我们将看到。这样猜测没有用。对不起，我的错！我添加了控制台。阅读内部循环该死