VB.net使用HtmlAgilityPack获取href的内部文本

VB.net使用HtmlAgilityPack获取href的内部文本,vb.net,href,html-agility-pack,innertext,Vb.net,Href,Html Agility Pack,Innertext,我现在已经更新了我的代码(感谢Tim帮助我学习),它已经开始工作了,但是没有给我想要的正确链接 这是我的工作代码: Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click Dim webClient As New System.Net.WebClient Dim WebSource As String = webClient.DownloadStri

我现在已经更新了我的代码(感谢Tim帮助我学习),它已经开始工作了,但是没有给我想要的正确链接

这是我的工作代码:

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim webClient As New System.Net.WebClient
        Dim WebSource As String = webClient.DownloadString("http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA")

    Dim doc = New HtmlAgilityPack.HtmlDocument()
        doc.LoadHtml(WebSource)
        Dim links = GetLinks(doc, "test")
        For Each Link In links
            ListBox1.Items.Add(Link.ToString())
        Next
    End Sub


   Public Class Link
        Public Sub New(Uri As Uri, Text As String)
            Me.Uri = Uri
            Me.Text = Text
        End Sub
        Public Property Text As String
        Public Property Uri As Uri

        Public Overrides Function ToString() As String
            Return String.Format(If(Uri Is Nothing, "", Uri.ToString()))
        End Function
    End Class


    Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link)
        Dim uri As Uri = Nothing
        Dim linksOnPage = From link In doc.DocumentNode.Descendants()
                          Where link.Name = "a" _
                          AndAlso link.Attributes("href") IsNot Nothing _
                          Let text = link.InnerText.Trim()
                          Let url = link.Attributes("href").Value
                          Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _
                          AndAlso uri.TryCreate(url, UriKind.Absolute, uri)

        Dim Uris As New List(Of Link)()
        For Each link In linksOnPage
            Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
        Next

        Return Uris
    End Function
我目前是这个HtmlAgilityPack的新手,我仍在学习,请耐心等待。 我的主要目标是:

示例链接:
http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA

我的预期链接输出包含单词“test”:


您应该改为使用属性
href
,还要注意.NET默认情况下区分大小写

For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
   Dim href = link.Attributes("href").Value
   If href.IndexOf("test", StringComparison.OrdinalIgnoreCase) >= 0 Then
       ListBox1.Items.Add(href)
       ' or
       ListBox1.Items.Add(link.InnerText)
   End If
Next 
下面是一个方法,它应该将文档中的所有链接作为
列表(链接)
返回
Link
是一个自定义类,有两个属性,一个用于文本,另一个用于
Uri

Public Class Link
    Public Sub New(Uri As Uri, Text As String)
        Me.Uri = Uri
        Me.Text = Text
    End Sub
    Public Property Text As String
    Public Property Uri As Uri

    Public Overrides Function ToString() As String
        Return String.Format("{0} [{1}]", Text, If(Uri Is Nothing, "", Uri.ToString()))
    End Function
End Class

Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument) As List(Of Link)
    Dim uri As Uri = Nothing
    Dim linksOnPage = From link In doc.DocumentNode.Descendants()
                      Where link.Name = "a" _
                      AndAlso link.Attributes("href") IsNot Nothing _
                      Let text = link.InnerText.Trim()
                      Let url = link.Attributes("href").Value
                      Where uri.TryCreate(url, UriKind.Absolute, uri)

    Dim Uris As New List(Of Link)()
    For Each link In linksOnPage
        Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
    Next

    Return Uris
End Function
以下是检查url是否包含给定文本的请求重载:

Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link)
    Dim uri As Uri = Nothing
    Dim linksOnPage = From link In doc.DocumentNode.Descendants()
                      Where link.Name = "a" _
                      AndAlso link.Attributes("href") IsNot Nothing _
                      Let text = link.InnerText.Trim()
                      Let url = link.Attributes("href").Value
                      Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _
                      AndAlso uri.TryCreate(url, UriKind.Absolute, uri)

    Dim Uris As New List(Of Link)()
    For Each link In linksOnPage
        Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
    Next

    Return Uris
End Function
已编辑现在已测试,可以正常工作,请按以下方式使用:

Dim site = File.ReadAllText("C:\Temp\website_test.htm")
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(site)
Dim links = GetLinks(doc)
For Each Link In links
    ListBox1.Items.Add(Link.ToString())
Next

您应该改为使用属性
href
,还要注意.NET默认情况下区分大小写

For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
   Dim href = link.Attributes("href").Value
   If href.IndexOf("test", StringComparison.OrdinalIgnoreCase) >= 0 Then
       ListBox1.Items.Add(href)
       ' or
       ListBox1.Items.Add(link.InnerText)
   End If
Next 
下面是一个方法,它应该将文档中的所有链接作为
列表(链接)
返回
Link
是一个自定义类,有两个属性,一个用于文本,另一个用于
Uri

Public Class Link
    Public Sub New(Uri As Uri, Text As String)
        Me.Uri = Uri
        Me.Text = Text
    End Sub
    Public Property Text As String
    Public Property Uri As Uri

    Public Overrides Function ToString() As String
        Return String.Format("{0} [{1}]", Text, If(Uri Is Nothing, "", Uri.ToString()))
    End Function
End Class

Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument) As List(Of Link)
    Dim uri As Uri = Nothing
    Dim linksOnPage = From link In doc.DocumentNode.Descendants()
                      Where link.Name = "a" _
                      AndAlso link.Attributes("href") IsNot Nothing _
                      Let text = link.InnerText.Trim()
                      Let url = link.Attributes("href").Value
                      Where uri.TryCreate(url, UriKind.Absolute, uri)

    Dim Uris As New List(Of Link)()
    For Each link In linksOnPage
        Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
    Next

    Return Uris
End Function
以下是检查url是否包含给定文本的请求重载:

Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link)
    Dim uri As Uri = Nothing
    Dim linksOnPage = From link In doc.DocumentNode.Descendants()
                      Where link.Name = "a" _
                      AndAlso link.Attributes("href") IsNot Nothing _
                      Let text = link.InnerText.Trim()
                      Let url = link.Attributes("href").Value
                      Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _
                      AndAlso uri.TryCreate(url, UriKind.Absolute, uri)

    Dim Uris As New List(Of Link)()
    For Each link In linksOnPage
        Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
    Next

    Return Uris
End Function
已编辑现在已测试,可以正常工作,请按以下方式使用:

Dim site = File.ReadAllText("C:\Temp\website_test.htm")
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(site)
Dim links = GetLinks(doc)
For Each Link In links
    ListBox1.Items.Add(Link.ToString())
Next

我将调用什么来显示链接?我尝试了link.InnerText,但仍然没有result@MarcIntes:为什么不
href
link.InnerText
?编辑了我的答案。该程序现在正在显示链接,但它包含了以/search?,/url?开头的非必需链接?。我只想显示以http://开头的链接,上面仍然有“test”这个词。可能吗?我再次尝试了link.InnerText,但它没有显示链接。相反,它会显示诸如图像、地图、播放、Youtube、翻译、,Books@MarcIntes:尝试我添加的方法来检测所有链接,但我尚未测试。我将调用什么来显示链接?我尝试了link.InnerText,但仍然没有result@MarcIntes:为什么不
href
link.InnerText
?编辑了我的答案。该程序现在正在显示链接,但它包含了以/search?,/url?开头的非必需链接?。我只想显示以http://开头的链接,上面仍然有“test”这个词。可能吗?我再次尝试了link.InnerText,但它没有显示链接。相反,它会显示诸如图像、地图、播放、Youtube、翻译、,Books@MarcIntes:尝试我添加的方法来检测所有链接,但我还没有测试它。