Vba 无法以表格格式获取数据

Vba 无法以表格格式获取数据,vba,excel,internet-explorer,web-scraping,Vba,Excel,Internet Explorer,Web Scraping,我用vba编写了一个脚本,使用IE从网页获取数据。数据不存储在任何表中,我的意思是没有表、tr或td标记。然而,它们看起来像是表格格式。为了清晰起见,您可以看到下图 到目前为止,我尝试的方法可以在一行中获取数据,如: $4,085 $1,620 $1,435 $35 $1,125 $905 我希望得到它们的方式如下: $4,085 $1,620 $1,435 $35 $1,125 $905 在其他语言中,有一个用于列表理解的选项,我可以用一行代码来处理它,但如果是vba,我

我用vba编写了一个脚本,使用IE从网页获取数据。数据不存储在任何表中,我的意思是没有
tr
td
标记。然而,它们看起来像是表格格式。为了清晰起见,您可以看到下图

到目前为止,我尝试的方法可以在一行中获取数据,如:

$4,085  
$1,620
$1,435  
$35
$1,125  
$905
我希望得到它们的方式如下:

$4,085  $1,620
$1,435  $35
$1,125  $905
在其他语言中,有一个用于
列表理解的选项
,我可以用一行代码来处理它,但如果是vba,我会被卡住

html元素
其中包含数据(它只是整个元素的一部分):


这使用CSS选择器工作。已更新以删除显式等待

选择器是:

#tco_detail_data > li
哪个是
tco\u detail\u数据的id中的
li

下面是使用CSS查询的网页的示例结果


代码:

Option Explicit
Public Sub Get_Information()
    Dim IE As New InternetExplorer

    With IE
        .Visible = False
        .navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
    End With
    Dim a As Object, exitTime As Date
    exitTime = Now + TimeSerial(0, 0, 5)

    Do
        DoEvents
        On Error Resume Next
        Set a = IE.document.querySelectorAll("#tco_detail_data")
        On Error GoTo 0
        If Now > exitTime Then Exit Do
    Loop While a Is Nothing

    If a Is Nothing Then Exit Sub

    Dim resultsNodeList As Object, i As Long, arr() As String
    Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")

    With ActiveSheet
        For i = 0 To 9
            arr = Split(resultsNodeList(i).innerText, Chr$(10))
            .Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
        Next
    End With

    IE.Quit
End Sub
选项显式
公共子系统获取信息()
Dim IE成为新的InternetExplorer
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While.Busy=True或.readyState<4:DoEvents:Wend
以
将对象变暗,退出时间变为日期
exitTime=Now+时间序列(0,0,5)
做
多芬特
出错时继续下一步
设置a=IE.document.querySelectorAll(“tco详细信息数据”)
错误转到0
如果现在>退出时间,则退出Do
循环,而a什么都不是
如果a为空,则退出Sub
Dim resultsNodeList作为对象,i作为长,arr()作为字符串
Set resultsNodeList=HTML.querySelectorAll(“#tco_detail_data>li”)
使用ActiveSheet
对于i=0到9
arr=Split(resultsNodeList(i).innerText,Chr$(10))
.Cells(i+1,1)。调整大小(1,UBound(arr)+1)。值=arr
下一个
以
即退出
端接头

生成工作表


其他信息:

Option Explicit
Public Sub Get_Information()
    Dim IE As New InternetExplorer

    With IE
        .Visible = False
        .navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
    End With
    Dim a As Object, exitTime As Date
    exitTime = Now + TimeSerial(0, 0, 5)

    Do
        DoEvents
        On Error Resume Next
        Set a = IE.document.querySelectorAll("#tco_detail_data")
        On Error GoTo 0
        If Now > exitTime Then Exit Do
    Loop While a Is Nothing

    If a Is Nothing Then Exit Sub

    Dim resultsNodeList As Object, i As Long, arr() As String
    Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")

    With ActiveSheet
        For i = 0 To 9
            arr = Split(resultsNodeList(i).innerText, Chr$(10))
            .Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
        Next
    End With

    IE.Quit
End Sub
数组部分是因为resultsNodeList(i)。innerText以“堆叠字符串”的形式返回,即中间有换行符;见下图。我对这些进行拆分,生成一个数组,然后将其写入工作表。数组基于0,因此我必须添加1才能正确填充范围


这使用CSS选择器工作。已更新以删除显式等待

选择器是:

#tco_detail_data > li
哪个是
tco\u detail\u数据的id中的
li

下面是使用CSS查询的网页的示例结果


代码:

Option Explicit
Public Sub Get_Information()
    Dim IE As New InternetExplorer

    With IE
        .Visible = False
        .navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
    End With
    Dim a As Object, exitTime As Date
    exitTime = Now + TimeSerial(0, 0, 5)

    Do
        DoEvents
        On Error Resume Next
        Set a = IE.document.querySelectorAll("#tco_detail_data")
        On Error GoTo 0
        If Now > exitTime Then Exit Do
    Loop While a Is Nothing

    If a Is Nothing Then Exit Sub

    Dim resultsNodeList As Object, i As Long, arr() As String
    Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")

    With ActiveSheet
        For i = 0 To 9
            arr = Split(resultsNodeList(i).innerText, Chr$(10))
            .Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
        Next
    End With

    IE.Quit
End Sub
选项显式
公共子系统获取信息()
Dim IE成为新的InternetExplorer
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
While.Busy=True或.readyState<4:DoEvents:Wend
以
将对象变暗,退出时间变为日期
exitTime=Now+时间序列(0,0,5)
做
多芬特
出错时继续下一步
设置a=IE.document.querySelectorAll(“tco详细信息数据”)
错误转到0
如果现在>退出时间,则退出Do
循环,而a什么都不是
如果a为空,则退出Sub
Dim resultsNodeList作为对象,i作为长,arr()作为字符串
Set resultsNodeList=HTML.querySelectorAll(“#tco_detail_data>li”)
使用ActiveSheet
对于i=0到9
arr=Split(resultsNodeList(i).innerText,Chr$(10))
.Cells(i+1,1)。调整大小(1,UBound(arr)+1)。值=arr
下一个
以
即退出
端接头

生成工作表


其他信息:

Option Explicit
Public Sub Get_Information()
    Dim IE As New InternetExplorer

    With IE
        .Visible = False
        .navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
    End With
    Dim a As Object, exitTime As Date
    exitTime = Now + TimeSerial(0, 0, 5)

    Do
        DoEvents
        On Error Resume Next
        Set a = IE.document.querySelectorAll("#tco_detail_data")
        On Error GoTo 0
        If Now > exitTime Then Exit Do
    Loop While a Is Nothing

    If a Is Nothing Then Exit Sub

    Dim resultsNodeList As Object, i As Long, arr() As String
    Set resultsNodeList = HTML.querySelectorAll("#tco_detail_data > li")

    With ActiveSheet
        For i = 0 To 9
            arr = Split(resultsNodeList(i).innerText, Chr$(10))
            .Cells(i + 1, 1).Resize(1, UBound(arr) + 1).Value = arr
        Next
    End With

    IE.Quit
End Sub
数组部分是因为resultsNodeList(i)。innerText以“堆叠字符串”的形式返回,即中间有换行符;见下图。我对这些进行拆分,生成一个数组,然后将其写入工作表。数组基于0,因此我必须添加1才能正确填充范围


除了QHarr已经展示的,还有另一种方法可以实现相同的目标:

Sub Get_Information()
    Dim IE As New InternetExplorer, HTML As HTMLDocument
    Dim posts As Object, post As Object, oitem As Object
    Dim R&, C&, B As Boolean

    With IE
        .Visible = False
        .Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
        Do While .Busy = True Or .ReadyState <> 4: DoEvents: Loop
        Set HTML = .Document
    End With

    ''no hardcoded delay is required. The following line should take care of that

    Do: Set oitem = HTML.getElementById("tco_detail_data"): DoEvents: Loop While oitem Is Nothing

    For Each posts In oitem.getElementsByTagName("li")
        C = 1: B = False

        For Each post In posts.getElementsByTagName("li")
            Cells(R + 1, C).Value = post.innerText
            C = C + 1: B = True
        Next post

        If B Then R = R + 1
    Next posts
    IE.Quit
End Sub
Sub-Get_信息()
Dim IE作为新的InternetExplorer,HTML作为HTMLDocument
将帖子设置为对象、帖子设置为对象、oitem设置为对象
将R&,C&,B标注为布尔值
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While.Busy=True或.ReadyState 4:DoEvents:Loop
设置HTML=.Document
以
''不需要硬编码延迟。下面的一行应该注意这一点
Do:Set oitem=HTML.getElementById(“tco\u detail\u data”):DoEvents:Loop While oitem Is Nothing
对于oitem.getElementsByTagName(“li”)中的每个帖子
C=1:B=False
对于posts.getElementsByTagName(“li”)中的每个帖子
单元格(R+1,C).Value=post.innerText
C=C+1:B=True
下一篇文章
如果B那么R=R+1
下一篇文章
即退出
端接头

除了QHarr已经展示的,还有另一种方法可以实现相同的目标:

Sub Get_Information()
    Dim IE As New InternetExplorer, HTML As HTMLDocument
    Dim posts As Object, post As Object, oitem As Object
    Dim R&, C&, B As Boolean

    With IE
        .Visible = False
        .Navigate "https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
        Do While .Busy = True Or .ReadyState <> 4: DoEvents: Loop
        Set HTML = .Document
    End With

    ''no hardcoded delay is required. The following line should take care of that

    Do: Set oitem = HTML.getElementById("tco_detail_data"): DoEvents: Loop While oitem Is Nothing

    For Each posts In oitem.getElementsByTagName("li")
        C = 1: B = False

        For Each post In posts.getElementsByTagName("li")
            Cells(R + 1, C).Value = post.innerText
            C = C + 1: B = True
        Next post

        If B Then R = R + 1
    Next posts
    IE.Quit
End Sub
Sub-Get_信息()
Dim IE作为新的InternetExplorer,HTML作为HTMLDocument
将帖子设置为对象、帖子设置为对象、oitem设置为对象
将R&,C&,B标注为布尔值
与IE
.Visible=False
.导航“https://www.edmunds.com/ford/escape/2017/cost-to-own/?zip=43215"
Do While.Busy=True或.ReadyState 4:DoEvents:Loop
设置HTML=.Document
以
''不需要硬编码延迟。下面的一行应该注意这一点
Do:Set oitem=HTML.getElementById(“tco\u detail\u data”):DoEvents:Loop While oitem Is Nothing
对于oitem.getElementsByTagName(“li”)中的每个帖子
C=1:B=False
对于posts.getElementsByTagName(“li”)中的每个帖子
单元格(R+1,C).Value=post.innerText
C=C+1:B=True
下一篇文章
如果B那么R=R+1
下一篇文章