VBA Scrape:从每个html元素获取href
下面的代码成功地循环了DOM中的每个元素,并将每个元素放入Excel工作表中。(标记名、ID、类名等) 我的问题是: 如何为每个元素刮取标签属性(标题、href等)? 具体来说,对于“A”标记,如何刮取“href”属性?VBA Scrape:从每个html元素获取href,vba,web-scraping,attributes,href,Vba,Web Scraping,Attributes,Href,下面的代码成功地循环了DOM中的每个元素,并将每个元素放入Excel工作表中。(标记名、ID、类名等) 我的问题是: 如何为每个元素刮取标签属性(标题、href等)? 具体来说,对于“A”标记,如何刮取“href”属性? Enum READYSTATE READYSTATE_UNINITIALIZED = 0 READYSTATE_LOADING = 1 READYSTATE_LOADED = 2 READYSTATE_INTERACTIVE = 3 RE
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim RowNumber As Integer
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "www.somesite.com"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Connecting..."
DoEvents
Loop
Set html = ie.document
RowNumber = 1
For Each element In html.all
Cells(RowNumber, "A").Value = element.tagName
Cells(RowNumber, "B").Value = element.ID
Cells(RowNumber, "C").Value = element.className
Cells(RowNumber, "D").Value = element.innerHTML
RowNumber = RowNumber + 1
Next element
Enum READYSTATE
READYSTATE\u未初始化=0
READYSTATE_加载=1
READYSTATE_已加载=2
READYSTATE_INTERACTIVE=3
READYSTATE_COMPLETE=4
结束枚举
Dim ie作为InternetExplorer
将html设置为HTMLDocument
将行数设置为整数
Set ie=新的InternetExplorer
可见=假
ie.navigate“www.somesite.com”
在ie.READYSTATE READYSTATE\u完成时执行此操作
Application.StatusBar=“正在连接…”
多芬特
环
设置html=ie.document
行数=1
对于html.all中的每个元素
单元格(行号,“A”)。值=element.tagName
单元格(行号,“B”)。值=element.ID
单元格(行编号,“C”).Value=element.className
单元格(行号,“D”).Value=element.innerHTML
RowNumber=RowNumber+1
下一个元素
任何帮助都将不胜感激。在
RowNumber=RowNumber+1
之前添加此行:
If (element.tagName = "A") Then Cells(RowNumber, "E").Value=element.getAttribute("href")
在
RowNumber=RowNumber+1
之前添加此行:
If (element.tagName = "A") Then Cells(RowNumber, "E").Value=element.getAttribute("href")