Excel 使用类/标记名提取数据并限制结果

Excel 使用类/标记名提取数据并限制结果,excel,vba,web-scraping,automation,Excel,Vba,Web Scraping,Automation,我正在尝试对以下网站进行web抓取: 到目前为止,我有以下代码返回每个h4标记的内部文本 Sub getContents() Dim XMLReq As New MSXML2.XMLHTTP60 Dim HTMLDoc As New MSHTML.HTMLDocument Dim SubTag As MSHTML.IHTMLElementCollection

我正在尝试对以下网站进行web抓取:

到目前为止,我有以下代码返回每个
h4
标记的内部文本

Sub getContents()
        
            Dim XMLReq As New MSXML2.XMLHTTP60
            Dim HTMLDoc As New MSHTML.HTMLDocument
            
            Dim SubTag As MSHTML.IHTMLElementCollection
            Dim SubName As MSHTML.IHTMLElement
            
            XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
            XMLReq.send
            
            If XMLReq.Status <> 200 Then
            
                MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
                Exit Sub
            End If
            
            HTMLDoc.body.innerHTML = XMLReq.responseText
            
            Set SubTag = HTMLDoc.getElementsByTagName("dt")
            
            For Each SubName In SubTag
            Debug.Print SubName.innerText
            Next SubName
            
        End Sub
此信息的标记名为“dd”,但我不确定如何同时返回这两个结果。我希望在即时窗口中,我可以得到物理和化学性质的列表,并且在每个属性的右边,也会返回值

尝试这一点,我有下面的代码,导致一个不匹配的错误,但我不明白我做错了什么

Sub getContents()
        
            Dim XMLReq As New MSXML2.XMLHTTP60
            Dim HTMLDoc As New MSHTML.HTMLDocument
            
            Dim SubTag As MSHTML.IHTMLElementCollection
            Dim SubName As MSHTML.IHTMLElement
            Dim SubInfo As MSHTML.IHTMLElement
            
            XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
            XMLReq.send
            
            If XMLReq.Status <> 200 Then
            
                MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
                Exit Sub
            End If
            
            HTMLDoc.body.innerHTML = XMLReq.responseText
            
            Set SubTag = HTMLDoc.getElementsByTagName("dt")
            Set SubInfo = SubTag.tags("dd")
            
            For Each SubName In SubTag
            Debug.Print SubName.innerText, SubInfo.innerText
            Next SubName
            
        End Sub
Sub-getContents()
Dim XMLReq作为新的MSXML2.XMLHTTP60
将HTMLDoc设置为新的MSHTML.HTMLDocument
将子标签变暗为MSHTML.IHTMLElementCollection
Dim子名称为MSHTML.IHTMLElement
将子信息设置为MSHTML.IHTMLElement
XMLReq.Open“Get”https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293”“错
XMLReq.send
如果XMLReq.Status为200,则
MsgBox“问题”&vbNewLine&XMLReq.Status&“-”&XMLReq.statusText
出口接头
如果结束
HTMLDoc.body.innerHTML=XMLReq.responseText
Set SubTag=HTMLDoc.getElementsByTagName(“dt”)
Set SubInfo=SubTag.tags(“dd”)
对于子标记中的每个子名称
Debug.Print SubName.innerText,SubInfo.innerText
下一个子名
端接头
我很感激这是一篇很长的帖子,但如果有人能评论我做错了什么,那就太好了

更新:

下面的代码可以更好地在即时窗口中实现所需的数据

Sub GetContents()
    
        Dim XMLReq As New MSXML2.XMLHTTP60
        Dim HTMLDoc As New MSHTML.HTMLDocument
        
        Dim SubSectList As MSHTML.IHTMLElement
        Dim SubSects As MSHTML.IHTMLElementCollection
        Dim SubSect As MSHTML.IHTMLElement
    
        XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
        XMLReq.send
        
        If XMLReq.Status <> 200 Then
        
            MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
            Exit Sub
        End If
        
        HTMLDoc.body.innerHTML = XMLReq.responseText
        
        Set SubSectList = HTMLDoc.getElementsByClassName("col-xs-12 col-lg-10 MainContent")(1)
        Set SubSects = SubSectList.getElementsByTagName("dt")

        'Debug.Print SubSects.Length
        
        For Each SubSect In SubSects
        Debug.Print SubSect.innerText & " : "; SubSect.NextSibling.innerText
       Next SubSect
 
    End Sub
Sub-GetContents()
Dim XMLReq作为新的MSXML2.XMLHTTP60
将HTMLDoc设置为新的MSHTML.HTMLDocument
Dim子列表为MSHTML.IHTMLElement
将子部分设置为MSHTML.IHTMLElementCollection
将子部分标注为MSHTML.IHTMLElement
XMLReq.Open“Get”https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293”“错
XMLReq.send
如果XMLReq.Status为200,则
MsgBox“问题”&vbNewLine&XMLReq.Status&“-”&XMLReq.statusText
出口接头
如果结束
HTMLDoc.body.innerHTML=XMLReq.responseText
Set subsctlist=HTMLDoc.getElementsByClassName(“col-xs-12 col-lg-10 MainContent”)(1)
Set subsets=subsetlist.getElementsByTagName(“dt”)
'Debug.Print SubSects.Length
对于子集中的每个子集
Debug.Print SubSect.innerText&“:”;SubSect.NextSibling.innerText
下一子集
端接头

我认为您希望限制那些dt,它们是具有类EndpointContent的元素的子元素;然后,您可以将下一个绑定链接到相邻的dd

Option Explicit

Public Sub GetContents()
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
        
    XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
    XMLReq.send
                  
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Dim i As Long
    
    With HTMLDoc.querySelectorAll(".EndpointContent dt")
        For i = 0 To .Length - 1
            Debug.Print .Item(i).innerText & " : " & .Item(i).NextSibling.NextSibling.innerText
            Debug.Print
        Next
    End With
End Sub

你好,谢谢你的回复。运行此代码时,我似乎遇到运行时错误
91
:对象变量或未设置块变量。错误出现在行
Debug.Print.Item(i).innerText&“:”&.Item(i).NextSibling.NextSibling.innerText
有什么想法吗?我今晚需要测试。昨天运行正常。@Nick我检查了QHarrs代码,对我来说,它工作正常。如果您使用webbrowser,它会读取网站上“科学属性”选项卡中的值。我在Windows 10上使用Excel 2016 32位64位当前版本1)数据未返回-服务器/连接计时2)阻塞。很可能是1。然后你会实施重试策略。啊,是的,我想这解释了我遇到的另一个问题。非常感谢。
Option Explicit

Public Sub GetContents()
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
        
    XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
    XMLReq.send
                  
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Dim i As Long
    
    With HTMLDoc.querySelectorAll(".EndpointContent dt")
        For i = 0 To .Length - 1
            Debug.Print .Item(i).innerText & " : " & .Item(i).NextSibling.NextSibling.innerText
            Debug.Print
        Next
    End With
End Sub