Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html VBA Excel在单击“后拉取新网页数据”;提交;_Html_Excel_Vba_Web Scraping_Web Crawler - Fatal编程技术网

Html VBA Excel在单击“后拉取新网页数据”;提交;

Html VBA Excel在单击“后拉取新网页数据”;提交;,html,excel,vba,web-scraping,web-crawler,Html,Excel,Vba,Web Scraping,Web Crawler,我试图从一个网站上获取一些信息,该网站通过API编号提供油井数据(API是美国每口油井的唯一编号) 网站: API示例:1708300502 问题是,当我进入第2页时,即.document.getElementsByTagName(“body”)(0)。innerText仍然从初始页返回数据。如何获取更新的页面数据 最终目标是进入第2页,通过IE.document.getElementsByTagName(“a”)(0)单击“30570”。单击,然后阅读最后的第3页。我就是不知道如何阅读更新的页

我试图从一个网站上获取一些信息,该网站通过API编号提供油井数据(API是美国每口油井的唯一编号)

网站:

API示例:1708300502

问题是,当我进入第2页时,即.document.getElementsByTagName(“body”)(0)。innerText仍然从初始页返回数据。如何获取更新的页面数据

最终目标是进入第2页,通过IE.document.getElementsByTagName(“a”)(0)单击“30570”。单击,然后阅读最后的第3页。我就是不知道如何阅读更新的页面:(

选项显式
子sonris_WellData()
Dim IE作为InternetExplorer
设置IE=CreateObject(“InternetExplorer.Application”)
可见=真实
作为整数的Dim i
“打开SONRIS网站
Application.StatusBar=“打开网站”
即“导航”http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellapi1"
Do While IE.readyState 4:DoEvents:Loop
Application.Wait Now()+时间值(“00:00:01”)
Application.StatusBar=False
IE.document.forms(0).p_apinum.Value=“1708300502”插件API
IE.document.表格(0).提交
'等待下一页打开
Application.StatusBar=“打开网站”
Do While IE.readyState 4:DoEvents:Loop
Application.Wait Now()+时间值(“00:00:01”)
Application.StatusBar=False
'阅读页面-这是问题发生的地方,MsgBox从第一页开始不断返回文本
MsgBox IE.document.getElementsByTagName(“正文”)(0).innerText
即退出
端接头

这似乎起作用了。不要使用WinAPI睡眠函数,而是使用
DoEvents
。我还在表单提交后添加了对
Sleep
函数的调用

我们经常看到一些javascript/等动态服务的站点,在这些情况下,浏览器可能显示为
READYSTATE\u COMPLETE
或不
Busy
,但页面尚未呈现“新”结果

这将使Excel.Application等待URL更改

或者,使用XMLHTTPRequest可能会有更好的运气(在SO和internet上的其他地方有很多这样的例子)。这样,您就可以像浏览器一样发送请求,而无需实际使用web浏览器。然后,您只需将返回文本解析为HTML或XML。为此,我将使用Microsoft XML v6.0库参考。

发布请求: ① 输入油井API编号

我检查了做出您提到的选择的网页。我使用检查了web流量,并注意到当您提交API编号时,最初的请求由一个代理处理


② 发帖请求:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function
POST
主体具有以下参数:

p_apinum
是键,关联值是原始油井API编号

利用这些信息,我制定了一个直接的POST请求,从而避免了您的第一个登录页


③ 按超链接:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function
接下来,我注意到您要按下的元素:

查看关联的HTML,它有一个关联的相对超链接:

我使用一个helper函数来解析页面HTML以获得这个相对链接并构造绝对路径:
GetNextURL(page.body.innerHTML)


④ 提出新请求:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function
我重新使用HTTPRequest函数
GetPage
发送第二个请求,请求的正文为空,并从通过返回的HTML文档中获取所有表格:
page.getElementsByTagName(“表格”)


⑤ 将表格写入Excel工作表:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function
我使用helper函数
AddHeaders
循环页面上的所有表格,以写出表格标题,并使用
WriteTables
将当前表格写入工作表


示例页面内容:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function


示例代码输出:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function


VBA:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

参考文献:

Option Explicit
Public Sub GetWellInfo()
    Dim ws As Worksheet, page As HTMLDocument, targetTable As HTMLTable, apiNumbers(), currNumber As Long
    Const PARAM1 As String = "p_apinum"
    Const BASESTRING As String = "http://sonlite.dnr.state.la.us/sundown/cart_prod/"
    apiNumbers = Array(1708300502, 1708300503)
    
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    With ws
        .Cells.ClearContents
        For currNumber = LBound(apiNumbers) To UBound(apiNumbers)
            Set page = GetPage(BASESTRING & "cart_con_wellapi2", apiNumbers(currNumber), PARAM1)
            Set page = GetPage(BASESTRING & GetNextURL(page.body.innerHTML))
            Dim allTables As Object
            Set allTables = page.getElementsByTagName("table")
         
            For Each targetTable In allTables
                AddHeaders targetTable, GetLastRow(ws, 1) + 2, ws
                WriteTables targetTable, GetLastRow(ws, 1), ws
            Next targetTable
        
        Next currNumber
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetPage(ByVal url As String, Optional ByVal apiNumber As Long, Optional ByVal paramN As String = vbNullString) As HTMLDocument
    Dim objHTTP As Object, html As New HTMLDocument
  
    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")

    Dim sBody As String
    If Not paramN = vbNullString Then sBody = paramN & "=" & apiNumber
    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", url, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
                Set GetPage = html
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Function
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Function
        End If
        On Error GoTo 0
    End With
 
End Function

Public Function GetNextURL(ByVal inputString As String)
    GetNextURL = Replace$(Replace$(Split(Split(inputString, "href=")(1), ">")(0), Chr$(34), vbNullString), "about:", vbNullString)
End Function

Public Sub AddHeaders(ByVal hTable As Object, ByVal startRow As Long, ByVal ws As Worksheet)
    Dim headers As Object, header As Object, columnCounter As Long
    Set headers = hTable.getElementsByTagName("th")
    For Each header In headers
        columnCounter = columnCounter + 1
        ws.Cells(startRow, columnCounter) = header.innerText
    Next header
End Sub

Public Sub WriteTables(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByRef ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ActiveSheet
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1:  c = 1
        Next tr
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

VBE>Tools>References>HTML对象库。

仍然不走运。我可以看到页面正在加载,因此延迟不是问题。我将尝试使用XMLHttpRequestooh抱歉,请尝试第二个建议
在IE.Location=”时执行http://...
”我在那里使用了错误的运算符。即使在我等待10-15秒后,IE.LocationURL仍返回第一个链接(),即使页面已在第二个链接上超过5秒:“尝试第二个建议…基于
IE.Location
循环。什么都没有发生。我重新编程了脚本,使它每隔5秒在新行上写下IE.LocationURL。结果是它不断从旧IE页面而不是新页面中提取数据:(