Excel 使用类/标记名提取数据并限制结果
我正在尝试对以下网站进行web抓取: 到目前为止,我有以下代码返回每个Excel 使用类/标记名提取数据并限制结果,excel,vba,web-scraping,automation,Excel,Vba,Web Scraping,Automation,我正在尝试对以下网站进行web抓取: 到目前为止,我有以下代码返回每个h4标记的内部文本 Sub getContents() Dim XMLReq As New MSXML2.XMLHTTP60 Dim HTMLDoc As New MSHTML.HTMLDocument Dim SubTag As MSHTML.IHTMLElementCollection
h4
标记的内部文本
Sub getContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim SubTag As MSHTML.IHTMLElementCollection
Dim SubName As MSHTML.IHTMLElement
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set SubTag = HTMLDoc.getElementsByTagName("dt")
For Each SubName In SubTag
Debug.Print SubName.innerText
Next SubName
End Sub
此信息的标记名为“dd”,但我不确定如何同时返回这两个结果。我希望在即时窗口中,我可以得到物理和化学性质的列表,并且在每个属性的右边,也会返回值
尝试这一点,我有下面的代码,导致一个不匹配的错误,但我不明白我做错了什么
Sub getContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim SubTag As MSHTML.IHTMLElementCollection
Dim SubName As MSHTML.IHTMLElement
Dim SubInfo As MSHTML.IHTMLElement
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set SubTag = HTMLDoc.getElementsByTagName("dt")
Set SubInfo = SubTag.tags("dd")
For Each SubName In SubTag
Debug.Print SubName.innerText, SubInfo.innerText
Next SubName
End Sub
Sub-getContents()
Dim XMLReq作为新的MSXML2.XMLHTTP60
将HTMLDoc设置为新的MSHTML.HTMLDocument
将子标签变暗为MSHTML.IHTMLElementCollection
Dim子名称为MSHTML.IHTMLElement
将子信息设置为MSHTML.IHTMLElement
XMLReq.Open“Get”https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293”“错
XMLReq.send
如果XMLReq.Status为200,则
MsgBox“问题”&vbNewLine&XMLReq.Status&“-”&XMLReq.statusText
出口接头
如果结束
HTMLDoc.body.innerHTML=XMLReq.responseText
Set SubTag=HTMLDoc.getElementsByTagName(“dt”)
Set SubInfo=SubTag.tags(“dd”)
对于子标记中的每个子名称
Debug.Print SubName.innerText,SubInfo.innerText
下一个子名
端接头
我很感激这是一篇很长的帖子,但如果有人能评论我做错了什么,那就太好了
更新:
下面的代码可以更好地在即时窗口中实现所需的数据
Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim SubSectList As MSHTML.IHTMLElement
Dim SubSects As MSHTML.IHTMLElementCollection
Dim SubSect As MSHTML.IHTMLElement
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set SubSectList = HTMLDoc.getElementsByClassName("col-xs-12 col-lg-10 MainContent")(1)
Set SubSects = SubSectList.getElementsByTagName("dt")
'Debug.Print SubSects.Length
For Each SubSect In SubSects
Debug.Print SubSect.innerText & " : "; SubSect.NextSibling.innerText
Next SubSect
End Sub
Sub-GetContents()
Dim XMLReq作为新的MSXML2.XMLHTTP60
将HTMLDoc设置为新的MSHTML.HTMLDocument
Dim子列表为MSHTML.IHTMLElement
将子部分设置为MSHTML.IHTMLElementCollection
将子部分标注为MSHTML.IHTMLElement
XMLReq.Open“Get”https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293”“错
XMLReq.send
如果XMLReq.Status为200,则
MsgBox“问题”&vbNewLine&XMLReq.Status&“-”&XMLReq.statusText
出口接头
如果结束
HTMLDoc.body.innerHTML=XMLReq.responseText
Set subsctlist=HTMLDoc.getElementsByClassName(“col-xs-12 col-lg-10 MainContent”)(1)
Set subsets=subsetlist.getElementsByTagName(“dt”)
'Debug.Print SubSects.Length
对于子集中的每个子集
Debug.Print SubSect.innerText&“:”;SubSect.NextSibling.innerText
下一子集
端接头
我认为您希望限制那些dt,它们是具有类EndpointContent的元素的子元素;然后,您可以将下一个绑定链接到相邻的dd
Option Explicit
Public Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
XMLReq.send
HTMLDoc.body.innerHTML = XMLReq.responseText
Dim i As Long
With HTMLDoc.querySelectorAll(".EndpointContent dt")
For i = 0 To .Length - 1
Debug.Print .Item(i).innerText & " : " & .Item(i).NextSibling.NextSibling.innerText
Debug.Print
Next
End With
End Sub
你好,谢谢你的回复。运行此代码时,我似乎遇到运行时错误
91
:对象变量或未设置块变量。错误出现在行Debug.Print.Item(i).innerText&“:”&.Item(i).NextSibling.NextSibling.innerText
有什么想法吗?我今晚需要测试。昨天运行正常。@Nick我检查了QHarrs代码,对我来说,它工作正常。如果您使用webbrowser,它会读取网站上“科学属性”选项卡中的值。我在Windows 10上使用Excel 2016 32位64位当前版本1)数据未返回-服务器/连接计时2)阻塞。很可能是1。然后你会实施重试策略。啊,是的,我想这解释了我遇到的另一个问题。非常感谢。
Option Explicit
Public Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.047.293", False
XMLReq.send
HTMLDoc.body.innerHTML = XMLReq.responseText
Dim i As Long
With HTMLDoc.querySelectorAll(".EndpointContent dt")
For i = 0 To .Length - 1
Debug.Print .Item(i).innerText & " : " & .Item(i).NextSibling.NextSibling.innerText
Debug.Print
Next
End With
End Sub