Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/84.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
未设置VBA对象变量-HTML抓取_Html_Excel_Vba_Web Scraping - Fatal编程技术网

未设置VBA对象变量-HTML抓取

未设置VBA对象变量-HTML抓取,html,excel,vba,web-scraping,Html,Excel,Vba,Web Scraping,我正试图从google上搜刮,但在从HTML片段中提取多个元素时遇到了困难。谷歌将每个搜索结果显示为一个“卡片”类。当我运行下面的代码时,我一直得到对象变量notset error > Option Explicit > > Sub StatusLetter() > SearchandScrape ("Apple") End Sub > > Sub SearchandScrape(URL As String) > Dim IE As

我正试图从google上搜刮,但在从HTML片段中提取多个元素时遇到了困难。谷歌将每个搜索结果显示为一个“卡片”类。当我运行下面的代码时,我一直得到对象变量notset error

> Option Explicit
> 
> Sub StatusLetter()
>     SearchandScrape ("Apple") End Sub
> 
> Sub SearchandScrape(URL As String)
>     Dim IE As New SHDocVw.InternetExplorer
>     Dim HTMLDoc As MSHTML.HTMLDocument
>     Dim HTMLCard As MSHTML.IHTMLElement
>     Dim HTMLCards As MSHTML.IHTMLElementCollection
>     Dim Temp As MSHTML.IHTMLElement
>     Dim scrapedCard As New card
>     
>     IE.Visible = True
>     IE.navigate "https://www.google.com/search?q=" & URL & "&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_LHL1bngAhXqQ98KHTs2D4QQpwUIHw&biw=1282&bih=893&dpr=1"
>         
>     Do While IE.readyState <> READYSTATE_COMPLETE
>     Loop
>     
>     Set HTMLDoc = IE.Document
>     
>     Set HTMLCards = HTMLDoc.getElementsByClassName("card")
>     
>     For Each HTMLCard In HTMLCards
>         Temp = HTMLCard.getElementsByTagName("h3")(0)
>         Debug.Print Temp.innerText
>     Next End Sub
>选项显式
> 
>子字母()
>搜索和刮取(“苹果”)末端接头
> 
>子搜索和刮取(URL作为字符串)
>Dim IE作为新的SHDocVw.InternetExplorer
>将HTMLDoc设置为MSHTML.HTMLDocument
>将HTMLCard设置为MSHTML.IHTMLElement
>将HTMLCards设置为MSHTML.IHTMLElementCollection
>尺寸温度为MSHTML.ihtmlement
>将刮卡变暗为新卡
>     
>可见=真实
>即“导航”https://www.google.com/search?q=“&URL&&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_lHL1BNGAHXQ98KHTS2D4QPWUIHW&biw=1282&bih=893&dpr=1”
>         
>在IE.readyState readyState\u完成时执行此操作
>环路
>     
>设置HTMLDoc=IE.Document
>     
>设置HTMLCards=HTMLDoc.getElementsByClassName(“卡”)
>     
>对于HTMLCard中的每个HTMLCard
>Temp=HTMLCard.getElementsByTagName(“h3”)(0)
>Debug.Print Temp.innerText
>下端接头

我在for each循环中得到错误。我想能够拉3个标签的文本,是存储在一个HTML段。其中2个是跨度,第三个是h3,用于HTMLCards中的每个卡。关于修复此问题的任何建议。我似乎不知道如何正确访问这些对象。谢谢

正确等待页面加载。记住在完成后退出应用程序。页面上只有一个元素具有该类名。我认为您实际上需要一个不同的选择器,如下所示

Option Explicit    
Public Sub StatusLetter()
    SearchandScrape "Apple"
End Sub

Public Sub SearchandScrape(URL As String)
    Dim IE As SHDocVw.InternetExplorer, headlines As Object, i As Long
    Dim agenciesAndTime As Object, agencies As Object, times As Object, descriptions As Object
    Set IE = New SHDocVw.InternetExplorer
    With IE
        .Visible = True
        .Navigate2 "https://www.google.com/search?q=" & URL & "&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_LHL1bngAhXqQ98KHTs2D4QQpwUIHw&biw=1282&bih=893&dpr=1"

        While .Busy Or .readyState < 4: DoEvents: Wend
        Set headlines = .document.querySelectorAll("h3.r")
        Set agenciesAndTime = .document.querySelectorAll("h3.r + div span")
        Set agencies = .document.querySelectorAll("h3.r + div span:nth-of-type(1)")
        Set times = .document.querySelectorAll("h3.r + div span:nth-of-type(3)")
        Set descriptions = .document.querySelectorAll("#ires div.st")
        Dim results(), headers()
        headers = Array("Headline", "Agency&Time", "Agency", "Time", "Description")
        ReDim results(1 To headlines.Length, 1 To 5)

        If headlines.Length > 0 Then
            For i = 0 To headlines.Length - 1
                results(i + 1, 1) = headlines.item(i).innerText
                results(i + 1, 2) = agenciesAndTime.item(i).innerText
                results(i + 1, 3) = agencies.item(i).innerText
                results(i + 1, 4) = times.item(i).innerText
                results(i + 1, 5) = descriptions.item(i).innerText
            Next
        End If
        .Quit
        With ThisWorkbook.Worksheets("Sheet1")
            .Cells.ClearContents
            .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
            .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
        End With
    End With
End Sub
选项显式
公函
搜索并刮取“苹果”
端接头
公共子搜索和刮取(URL为字符串)
Dim IE作为SHDocVw.InternetExplorer,标题作为对象,我一样长
Dim agencies和Time作为对象,agencies作为对象,times作为对象,descriptions作为对象
设置IE=New SHDocVw.InternetExplorer
与IE
.Visible=True
.导航2“https://www.google.com/search?q=“&URL&&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_lHL1BNGAHXQ98KHTS2D4QPWUIHW&biw=1282&bih=893&dpr=1”
当.Busy或.readyState<4:DoEvents:Wend时
设置标题=.document.querySelectorAll(“h3.r”)
Set agenciesAndTime=.document.queryselectoral(“h3.r+div span”)
Set agencies=.document.queryselectoral(“h3.r+div span:n类型(1)”)
设置时间=.document.querySelectorAll(“h3.r+div span:n类型(3)”)
集合描述=.document.queryselectoral(#ires div.st)
Dim results(),headers()
标题=数组(“标题”、“代理和时间”、“代理”、“时间”、“描述”)
重播结果(1到标题。长度,1到5)
如果标题.Length>0,则
对于标题,i=0。长度-1
结果(i+1,1)=标题。项目(i)。内部文本
结果(i+1,2)=agenciesAndTime.item(i).innerText
结果(i+1,3)=机构。项目(i)。内部文本
结果(i+1,4)=次。项目(i)。内部文本
结果(i+1,5)=描述。项目(i)。内部文本
下一个
如果结束
退出
使用此工作簿。工作表(“表1”)
.Cells.ClearContents
.单元格(1,1).调整大小(1,UBound(页眉)+1)=页眉
.单元格(2,1).调整大小(UBound(结果,1),UBound(结果,2))=结果
以
以
端接头

正确等待页面加载。记住在完成后退出应用程序。页面上只有一个元素具有该类名。我认为您实际上需要一个不同的选择器,如下所示

Option Explicit    
Public Sub StatusLetter()
    SearchandScrape "Apple"
End Sub

Public Sub SearchandScrape(URL As String)
    Dim IE As SHDocVw.InternetExplorer, headlines As Object, i As Long
    Dim agenciesAndTime As Object, agencies As Object, times As Object, descriptions As Object
    Set IE = New SHDocVw.InternetExplorer
    With IE
        .Visible = True
        .Navigate2 "https://www.google.com/search?q=" & URL & "&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_LHL1bngAhXqQ98KHTs2D4QQpwUIHw&biw=1282&bih=893&dpr=1"

        While .Busy Or .readyState < 4: DoEvents: Wend
        Set headlines = .document.querySelectorAll("h3.r")
        Set agenciesAndTime = .document.querySelectorAll("h3.r + div span")
        Set agencies = .document.querySelectorAll("h3.r + div span:nth-of-type(1)")
        Set times = .document.querySelectorAll("h3.r + div span:nth-of-type(3)")
        Set descriptions = .document.querySelectorAll("#ires div.st")
        Dim results(), headers()
        headers = Array("Headline", "Agency&Time", "Agency", "Time", "Description")
        ReDim results(1 To headlines.Length, 1 To 5)

        If headlines.Length > 0 Then
            For i = 0 To headlines.Length - 1
                results(i + 1, 1) = headlines.item(i).innerText
                results(i + 1, 2) = agenciesAndTime.item(i).innerText
                results(i + 1, 3) = agencies.item(i).innerText
                results(i + 1, 4) = times.item(i).innerText
                results(i + 1, 5) = descriptions.item(i).innerText
            Next
        End If
        .Quit
        With ThisWorkbook.Worksheets("Sheet1")
            .Cells.ClearContents
            .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
            .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
        End With
    End With
End Sub
选项显式
公函
搜索并刮取“苹果”
端接头
公共子搜索和刮取(URL为字符串)
Dim IE作为SHDocVw.InternetExplorer,标题作为对象,我一样长
Dim agencies和Time作为对象,agencies作为对象,times作为对象,descriptions作为对象
设置IE=New SHDocVw.InternetExplorer
与IE
.Visible=True
.导航2“https://www.google.com/search?q=“&URL&&tbm=nws&source=lnt&tbs=qdr:d&sa=X&ved=0ahUKEwjf_lHL1BNGAHXQ98KHTS2D4QPWUIHW&biw=1282&bih=893&dpr=1”
当.Busy或.readyState<4:DoEvents:Wend时
设置标题=.document.querySelectorAll(“h3.r”)
Set agenciesAndTime=.document.queryselectoral(“h3.r+div span”)
Set agencies=.document.queryselectoral(“h3.r+div span:n类型(1)”)
设置时间=.document.querySelectorAll(“h3.r+div span:n类型(3)”)
集合描述=.document.queryselectoral(#ires div.st)
Dim results(),headers()
标题=数组(“标题”、“代理和时间”、“代理”、“时间”、“描述”)
重播结果(1到标题。长度,1到5)
如果标题.Length>0,则
对于标题,i=0。长度-1
结果(i+1,1)=标题。项目(i)。内部文本
结果(i+1,2)=agenciesAndTime.item(i).innerText
结果(i+1,3)=机构。项目(i)。内部文本
结果(i+1,4)=次。项目(i)。内部文本
结果(i+1,5)=描述。项目(i)。内部文本
下一个
如果结束
退出
使用此工作簿。工作表(“表1”)
.Cells.ClearContents
.单元格(1,1).调整大小(1,UBound(页眉)+1)=页眉
.单元格(2,1).调整大小(UBound(结果,1),UBound(结果,2))=结果
以
以
端接头

感谢您的回复。我如何从谷歌卡片(绿色文本和“7小时前”)中获得新闻机构的名称、发布时间和摘要。我最终需要把它排成几行。我无法从.getElements获取所有信息。。。因为它不会匹配1对1,因为有些卡片会将相关标题和o分组