Vba 无法以表格格式获取数据
我用vba编写了一个脚本,使用IE从网页获取数据。数据不存储在任何表中,我的意思是没有Vba 无法以表格格式获取数据,vba,excel,internet-explorer,web-scraping,Vba,Excel,Internet Explorer,Web Scraping,我用vba编写了一个脚本,使用IE从网页获取数据。数据不存储在任何表中,我的意思是没有表、tr或td标记。然而,它们看起来像是表格格式。为了清晰起见,您可以看到下图 到目前为止,我尝试的方法可以在一行中获取数据,如: $4,085 $1,620 $1,435 $35 $1,125 $905 我希望得到它们的方式如下: $4,085 $1,620 $1,435 $35 $1,125 $905 在其他语言中,有一个用于列表理解的选项,我可以用一行代码来处理它,但如果是vba,我
表
、tr
或td
标记。然而,它们看起来像是表格格式。为了清晰起见,您可以看到下图
到目前为止,我尝试的方法可以在一行中获取数据,如:
$4,085
$1,620
$1,435
$35
$1,125
$905
我希望得到它们的方式如下:
$4,085 $1,620
$1,435 $35
$1,125 $905
在其他语言中,有一个用于列表理解的选项
,我可以用一行代码来处理它,但如果是vba,我会被卡住
html元素
其中包含数据(它只是整个元素的一部分):
这使用CSS选择器工作。已更新以删除显式等待 选择器是:
#tco_detail_data > li
哪个是tco\u detail\u数据的id中的li
下面是使用CSS查询的网页的示例结果
代码:
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = False
.navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .readyState < 4: DoEvents: Wend
End With
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 5)
Do
DoEvents
On Error Resume Next
Set a = IE.document.querySelectorAll("#tco_detail_data")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Dim resultsNodeList As Object, i As Long, arr() As String
Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")
With ActiveSheet
For i = 0 To 9
arr = Split(resultsNodeList(i).innerText, Chr$(10))
.Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
Next
End With
IE.Quit
End Sub
选项显式
公共子系统获取信息()
Dim IE成为新的InternetExplorer
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While.Busy=True或.readyState<4:DoEvents:Wend
以
将对象变暗,退出时间变为日期
exitTime=Now+时间序列(0,0,5)
做
多芬特
出错时继续下一步
设置a=IE.document.querySelectorAll(“tco详细信息数据”)
错误转到0
如果现在>退出时间,则退出Do
循环,而a什么都不是
如果a为空,则退出Sub
Dim resultsNodeList作为对象,i作为长,arr()作为字符串
Set resultsNodeList=HTML.querySelectorAll(“#tco_detail_data>li”)
使用ActiveSheet
对于i=0到9
arr=Split(resultsNodeList(i).innerText,Chr$(10))
.Cells(i+1,1)。调整大小(1,UBound(arr)+1)。值=arr
下一个
以
即退出
端接头
生成工作表
其他信息:
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = False
.navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .readyState < 4: DoEvents: Wend
End With
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 5)
Do
DoEvents
On Error Resume Next
Set a = IE.document.querySelectorAll("#tco_detail_data")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Dim resultsNodeList As Object, i As Long, arr() As String
Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")
With ActiveSheet
For i = 0 To 9
arr = Split(resultsNodeList(i).innerText, Chr$(10))
.Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
Next
End With
IE.Quit
End Sub
数组部分是因为resultsNodeList(i)。innerText以“堆叠字符串”的形式返回,即中间有换行符;见下图。我对这些进行拆分,生成一个数组,然后将其写入工作表。数组基于0,因此我必须添加1才能正确填充范围
这使用CSS选择器工作。已更新以删除显式等待
选择器是:
#tco_detail_data > li
哪个是tco\u detail\u数据的id中的li
下面是使用CSS查询的网页的示例结果
代码:
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = False
.navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .readyState < 4: DoEvents: Wend
End With
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 5)
Do
DoEvents
On Error Resume Next
Set a = IE.document.querySelectorAll("#tco_detail_data")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Dim resultsNodeList As Object, i As Long, arr() As String
Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")
With ActiveSheet
For i = 0 To 9
arr = Split(resultsNodeList(i).innerText, Chr$(10))
.Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
Next
End With
IE.Quit
End Sub
选项显式
公共子系统获取信息()
Dim IE成为新的InternetExplorer
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While.Busy=True或.readyState<4:DoEvents:Wend
以
将对象变暗,退出时间变为日期
exitTime=Now+时间序列(0,0,5)
做
多芬特
出错时继续下一步
设置a=IE.document.querySelectorAll(“tco详细信息数据”)
错误转到0
如果现在>退出时间,则退出Do
循环,而a什么都不是
如果a为空,则退出Sub
Dim resultsNodeList作为对象,i作为长,arr()作为字符串
Set resultsNodeList=HTML.querySelectorAll(“#tco_detail_data>li”)
使用ActiveSheet
对于i=0到9
arr=Split(resultsNodeList(i).innerText,Chr$(10))
.Cells(i+1,1)。调整大小(1,UBound(arr)+1)。值=arr
下一个
以
即退出
端接头
生成工作表
其他信息:
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = False
.navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While .Busy = True Or .readyState < 4: DoEvents: Wend
End With
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 5)
Do
DoEvents
On Error Resume Next
Set a = IE.document.querySelectorAll("#tco_detail_data")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Dim resultsNodeList As Object, i As Long, arr() As String
Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")
With ActiveSheet
For i = 0 To 9
arr = Split(resultsNodeList(i).innerText, Chr$(10))
.Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
Next
End With
IE.Quit
End Sub
数组部分是因为resultsNodeList(i)。innerText以“堆叠字符串”的形式返回,即中间有换行符;见下图。我对这些进行拆分,生成一个数组,然后将其写入工作表。数组基于0,因此我必须添加1才能正确填充范围
除了QHarr已经展示的,还有另一种方法可以实现相同的目标:
Sub Get_Information()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, oitem As Object
Dim R&, C&, B As Boolean
With IE
.Visible = False
.Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While .Busy = True Or .ReadyState <> 4: DoEvents: Loop
Set HTML = .Document
End With
''no hardcoded delay is required. The following line should take care of that
Do: Set oitem = HTML.getElementById("tco_detail_data"): DoEvents: Loop While oitem Is Nothing
For Each posts In oitem.getElementsByTagName("li")
C = 1: B = False
For Each post In posts.getElementsByTagName("li")
Cells(R + 1, C).Value = post.innerText
C = C + 1: B = True
Next post
If B Then R = R + 1
Next posts
IE.Quit
End Sub
Sub-Get_信息()
Dim IE作为新的InternetExplorer,HTML作为HTMLDocument
将帖子设置为对象、帖子设置为对象、oitem设置为对象
将R&,C&,B标注为布尔值
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While.Busy=True或.ReadyState 4:DoEvents:Loop
设置HTML=.Document
以
''不需要硬编码延迟。下面的一行应该注意这一点
Do:Set oitem=HTML.getElementById(“tco\u detail\u data”):DoEvents:Loop While oitem Is Nothing
对于oitem.getElementsByTagName(“li”)中的每个帖子
C=1:B=False
对于posts.getElementsByTagName(“li”)中的每个帖子
单元格(R+1,C).Value=post.innerText
C=C+1:B=True
下一篇文章
如果B那么R=R+1
下一篇文章
即退出
端接头
除了QHarr已经展示的,还有另一种方法可以实现相同的目标:
Sub Get_Information()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, oitem As Object
Dim R&, C&, B As Boolean
With IE
.Visible = False
.Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While .Busy = True Or .ReadyState <> 4: DoEvents: Loop
Set HTML = .Document
End With
''no hardcoded delay is required. The following line should take care of that
Do: Set oitem = HTML.getElementById("tco_detail_data"): DoEvents: Loop While oitem Is Nothing
For Each posts In oitem.getElementsByTagName("li")
C = 1: B = False
For Each post In posts.getElementsByTagName("li")
Cells(R + 1, C).Value = post.innerText
C = C + 1: B = True
Next post
If B Then R = R + 1
Next posts
IE.Quit
End Sub
Sub-Get_信息()
Dim IE作为新的InternetExplorer,HTML作为HTMLDocument
将帖子设置为对象、帖子设置为对象、oitem设置为对象
将R&,C&,B标注为布尔值
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While.Busy=True或.ReadyState 4:DoEvents:Loop
设置HTML=.Document
以
''不需要硬编码延迟。下面的一行应该注意这一点
Do:Set oitem=HTML.getElementById(“tco\u detail\u data”):DoEvents:Loop While oitem Is Nothing
对于oitem.getElementsByTagName(“li”)中的每个帖子
C=1:B=False
对于posts.getElementsByTagName(“li”)中的每个帖子
单元格(R+1,C).Value=post.innerText
C=C+1:B=True
下一篇文章
如果B那么R=R+1
下一篇文章