未设置VBA对象变量-HTML抓取
我正试图从google上搜刮,但在从HTML片段中提取多个元素时遇到了困难。谷歌将每个搜索结果显示为一个“卡片”类。当我运行下面的代码时,我一直得到对象变量notset error未设置VBA对象变量-HTML抓取,html,excel,vba,web-scraping,Html,Excel,Vba,Web Scraping,我正试图从google上搜刮,但在从HTML片段中提取多个元素时遇到了困难。谷歌将每个搜索结果显示为一个“卡片”类。当我运行下面的代码时,我一直得到对象变量notset error > Option Explicit > > Sub StatusLetter() > SearchandScrape ("Apple") End Sub > > Sub SearchandScrape(URL As String) > Dim IE As
> Option Explicit
>
> Sub StatusLetter()
> SearchandScrape ("Apple") End Sub
>
> Sub SearchandScrape(URL As String)
> Dim IE As New SHDocVw.InternetExplorer
> Dim HTMLDoc As MSHTML.HTMLDocument
> Dim HTMLCard As MSHTML.IHTMLElement
> Dim HTMLCards As MSHTML.IHTMLElementCollection
> Dim Temp As MSHTML.IHTMLElement
> Dim scrapedCard As New card
>
> IE.Visible = True
> IE.navigate "https://www.google.com/search?q=" & URL & "&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_LHL1bngAhXqQ98KHTs2D4QQpwUIHw&biw=1282&bih=893&dpr=1"
>
> Do While IE.readyState <> READYSTATE_COMPLETE
> Loop
>
> Set HTMLDoc = IE.Document
>
> Set HTMLCards = HTMLDoc.getElementsByClassName("card")
>
> For Each HTMLCard In HTMLCards
> Temp = HTMLCard.getElementsByTagName("h3")(0)
> Debug.Print Temp.innerText
> Next End Sub
>选项显式
>
>子字母()
>搜索和刮取(“苹果”)末端接头
>
>子搜索和刮取(URL作为字符串)
>Dim IE作为新的SHDocVw.InternetExplorer
>将HTMLDoc设置为MSHTML.HTMLDocument
>将HTMLCard设置为MSHTML.IHTMLElement
>将HTMLCards设置为MSHTML.IHTMLElementCollection
>尺寸温度为MSHTML.ihtmlement
>将刮卡变暗为新卡
>
>可见=真实
>即“导航”https://www.google.com/search?q=“&URL&&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_lHL1BNGAHXQ98KHTS2D4QPWUIHW&biw=1282&bih=893&dpr=1”
>
>在IE.readyState readyState\u完成时执行此操作
>环路
>
>设置HTMLDoc=IE.Document
>
>设置HTMLCards=HTMLDoc.getElementsByClassName(“卡”)
>
>对于HTMLCard中的每个HTMLCard
>Temp=HTMLCard.getElementsByTagName(“h3”)(0)
>Debug.Print Temp.innerText
>下端接头
我在for each循环中得到错误。我想能够拉3个标签的文本,是存储在一个HTML段。其中2个是跨度,第三个是h3,用于HTMLCards中的每个卡。关于修复此问题的任何建议。我似乎不知道如何正确访问这些对象。谢谢 正确等待页面加载。记住在完成后退出应用程序。页面上只有一个元素具有该类名。我认为您实际上需要一个不同的选择器,如下所示
Option Explicit
Public Sub StatusLetter()
SearchandScrape "Apple"
End Sub
Public Sub SearchandScrape(URL As String)
Dim IE As SHDocVw.InternetExplorer, headlines As Object, i As Long
Dim agenciesAndTime As Object, agencies As Object, times As Object, descriptions As Object
Set IE = New SHDocVw.InternetExplorer
With IE
.Visible = True
.Navigate2 "https://www.google.com/search?q=" & URL & "&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_LHL1bngAhXqQ98KHTs2D4QQpwUIHw&biw=1282&bih=893&dpr=1"
While .Busy Or .readyState < 4: DoEvents: Wend
Set headlines = .document.querySelectorAll("h3.r")
Set agenciesAndTime = .document.querySelectorAll("h3.r + div span")
Set agencies = .document.querySelectorAll("h3.r + div span:nth-of-type(1)")
Set times = .document.querySelectorAll("h3.r + div span:nth-of-type(3)")
Set descriptions = .document.querySelectorAll("#ires div.st")
Dim results(), headers()
headers = Array("Headline", "Agency&Time", "Agency", "Time", "Description")
ReDim results(1 To headlines.Length, 1 To 5)
If headlines.Length > 0 Then
For i = 0 To headlines.Length - 1
results(i + 1, 1) = headlines.item(i).innerText
results(i + 1, 2) = agenciesAndTime.item(i).innerText
results(i + 1, 3) = agencies.item(i).innerText
results(i + 1, 4) = times.item(i).innerText
results(i + 1, 5) = descriptions.item(i).innerText
Next
End If
.Quit
With ThisWorkbook.Worksheets("Sheet1")
.Cells.ClearContents
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End With
End Sub
选项显式
公函
搜索并刮取“苹果”
端接头
公共子搜索和刮取(URL为字符串)
Dim IE作为SHDocVw.InternetExplorer,标题作为对象,我一样长
Dim agencies和Time作为对象,agencies作为对象,times作为对象,descriptions作为对象
设置IE=New SHDocVw.InternetExplorer
与IE
.Visible=True
.导航2“https://www.google.com/search?q=“&URL&&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_lHL1BNGAHXQ98KHTS2D4QPWUIHW&biw=1282&bih=893&dpr=1”
当.Busy或.readyState<4:DoEvents:Wend时
设置标题=.document.querySelectorAll(“h3.r”)
Set agenciesAndTime=.document.queryselectoral(“h3.r+div span”)
Set agencies=.document.queryselectoral(“h3.r+div span:n类型(1)”)
设置时间=.document.querySelectorAll(“h3.r+div span:n类型(3)”)
集合描述=.document.queryselectoral(#ires div.st)
Dim results(),headers()
标题=数组(“标题”、“代理和时间”、“代理”、“时间”、“描述”)
重播结果(1到标题。长度,1到5)
如果标题.Length>0,则
对于标题,i=0。长度-1
结果(i+1,1)=标题。项目(i)。内部文本
结果(i+1,2)=agenciesAndTime.item(i).innerText
结果(i+1,3)=机构。项目(i)。内部文本
结果(i+1,4)=次。项目(i)。内部文本
结果(i+1,5)=描述。项目(i)。内部文本
下一个
如果结束
退出
使用此工作簿。工作表(“表1”)
.Cells.ClearContents
.单元格(1,1).调整大小(1,UBound(页眉)+1)=页眉
.单元格(2,1).调整大小(UBound(结果,1),UBound(结果,2))=结果
以
以
端接头
正确等待页面加载。记住在完成后退出应用程序。页面上只有一个元素具有该类名。我认为您实际上需要一个不同的选择器,如下所示
Option Explicit
Public Sub StatusLetter()
SearchandScrape "Apple"
End Sub
Public Sub SearchandScrape(URL As String)
Dim IE As SHDocVw.InternetExplorer, headlines As Object, i As Long
Dim agenciesAndTime As Object, agencies As Object, times As Object, descriptions As Object
Set IE = New SHDocVw.InternetExplorer
With IE
.Visible = True
.Navigate2 "https://www.google.com/search?q=" & URL & "&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_LHL1bngAhXqQ98KHTs2D4QQpwUIHw&biw=1282&bih=893&dpr=1"
While .Busy Or .readyState < 4: DoEvents: Wend
Set headlines = .document.querySelectorAll("h3.r")
Set agenciesAndTime = .document.querySelectorAll("h3.r + div span")
Set agencies = .document.querySelectorAll("h3.r + div span:nth-of-type(1)")
Set times = .document.querySelectorAll("h3.r + div span:nth-of-type(3)")
Set descriptions = .document.querySelectorAll("#ires div.st")
Dim results(), headers()
headers = Array("Headline", "Agency&Time", "Agency", "Time", "Description")
ReDim results(1 To headlines.Length, 1 To 5)
If headlines.Length > 0 Then
For i = 0 To headlines.Length - 1
results(i + 1, 1) = headlines.item(i).innerText
results(i + 1, 2) = agenciesAndTime.item(i).innerText
results(i + 1, 3) = agencies.item(i).innerText
results(i + 1, 4) = times.item(i).innerText
results(i + 1, 5) = descriptions.item(i).innerText
Next
End If
.Quit
With ThisWorkbook.Worksheets("Sheet1")
.Cells.ClearContents
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End With
End Sub
选项显式
公函
搜索并刮取“苹果”
端接头
公共子搜索和刮取(URL为字符串)
Dim IE作为SHDocVw.InternetExplorer,标题作为对象,我一样长
Dim agencies和Time作为对象,agencies作为对象,times作为对象,descriptions作为对象
设置IE=New SHDocVw.InternetExplorer
与IE
.Visible=True
.导航2“https://www.google.com/search?q=“&URL&&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_lHL1BNGAHXQ98KHTS2D4QPWUIHW&biw=1282&bih=893&dpr=1”
当.Busy或.readyState<4:DoEvents:Wend时
设置标题=.document.querySelectorAll(“h3.r”)
Set agenciesAndTime=.document.queryselectoral(“h3.r+div span”)
Set agencies=.document.queryselectoral(“h3.r+div span:n类型(1)”)
设置时间=.document.querySelectorAll(“h3.r+div span:n类型(3)”)
集合描述=.document.queryselectoral(#ires div.st)
Dim results(),headers()
标题=数组(“标题”、“代理和时间”、“代理”、“时间”、“描述”)
重播结果(1到标题。长度,1到5)
如果标题.Length>0,则
对于标题,i=0。长度-1
结果(i+1,1)=标题。项目(i)。内部文本
结果(i+1,2)=agenciesAndTime.item(i).innerText
结果(i+1,3)=机构。项目(i)。内部文本
结果(i+1,4)=次。项目(i)。内部文本
结果(i+1,5)=描述。项目(i)。内部文本
下一个
如果结束
退出
使用此工作簿。工作表(“表1”)
.Cells.ClearContents
.单元格(1,1).调整大小(1,UBound(页眉)+1)=页眉
.单元格(2,1).调整大小(UBound(结果,1),UBound(结果,2))=结果
以
以
端接头
感谢您的回复。我如何从谷歌卡片(绿色文本和“7小时前”)中获得新闻机构的名称、发布时间和摘要。我最终需要把它排成几行。我无法从.getElements获取所有信息。。。因为它不会匹配1对1,因为有些卡片会将相关标题和o分组