Html 使用VBA进行WebScraping-更改InputBox的值

Html 使用VBA进行WebScraping-更改InputBox的值,html,excel,vba,Html,Excel,Vba,我对VBA很有经验,但对网络垃圾很陌生。到目前为止,我设法从其他网页中提取了一些表格,但这一个给我带来了困难。链接是 基本上,我单击“Exportar Cuadro”按钮旁边的箭头下拉列表。在此之后,我需要将出现在那里的两个日期都更改为一个特定的日期,并将其转换为一个变量 如何更改网页上的输入框?到目前为止,我的代码是下一个: Option Explicit Sub test() Dim URL As String, URL2 As String, URL3 As String, URL4

我对VBA很有经验,但对网络垃圾很陌生。到目前为止,我设法从其他网页中提取了一些表格,但这一个给我带来了困难。链接是

基本上,我单击“Exportar Cuadro”按钮旁边的箭头下拉列表。在此之后,我需要将出现在那里的两个日期都更改为一个特定的日期,并将其转换为一个变量

如何更改网页上的输入框?到目前为止,我的代码是下一个:

Option Explicit

Sub test()

Dim URL As String, URL2 As String, URL3 As String, URL4 As String
Dim IE As Object, obj As Object, colTR As Object, doc As Object, tr As Object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim objCollection As Object
Dim j As String, i As Integer


URL = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA51&locale=es"
URL2 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA52&locale=es"
URL3 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA53&locale=es"
URL4 = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es"
'Tipos de cambio
Set IE = CreateObject("InternetExplorer.Application")

IE.Visible = True
IE.navigate URL4

Do While IE.Busy Or IE.readyState <> 4
    DoEvents
Loop

Application.Wait (Now + TimeValue("00:00:01"))

IE.document.getElementById("exportaCuadroToggle").Click

Set objCollection = IE.document.getElementsByTagName("ID")
i = 0
While i < objCollection.Length
    If objCollection(i).Value = "26/08/2019" Then
        ' Set text for search
        objCollection(i).Value = "01/08/2019"
    End If
    If objCollection(i).Name = "form-control form-control-sm fechaFin" Then
        ' Set text for search
        objCollection(i).Value = "01/08/2019"
    End If
Wend

End Sub
选项显式
子测试()
Dim URL作为字符串,URL2作为字符串,URL3作为字符串,URL4作为字符串
Dim IE作为对象,obj作为对象,colTR作为对象,doc作为对象,tr作为对象
Dim eleColtr作为tr标记的MSHTML.IHTMLElementCollection元素集合
Dim eleColtd作为td标记的MSHTML.IHTMLElementCollection元素集合
Dim eleRow作为MSHTML.IHTMLElement的行元素
Dim eleCol作为MSHTML.IHTMLElement的列元素
作为对象的Dim OBJ集合
Dim j作为字符串,i作为整数
URL=”https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA51&locale=es"
URL2=”https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA52&locale=es"
URL3=”https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA53&locale=es"
URL4=”http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es"
“坎比奥酒店
设置IE=CreateObject(“InternetExplorer.Application”)
可见=真实
即,导航URL4
在忙或准备状态4时执行
多芬特
环
Application.Wait(现在+时间值(“00:00:01”))
IE.document.getElementById(“exportaCuadroToggle”)。单击
Set objCollection=IE.document.getElementsByTagName(“ID”)
i=0
而我

注意:
URL
URL2
URL3
在完整的代码中使用,但我暂时不使用这一部分,因为这些链接已经完成了我想要的操作。

我可以通过在以下位置设置断点来更改日期:

Set-objCollection=IE.document.getElementsByTagName(“ID”)

然后,我使用即时窗口设置了一个您已经为我声明的变量:

set elecol=ie.document.queryselector(#selecPeriodoCuadro>div>div>input.form control.form control sm.fechaInicio”)

并再次使用即时窗口更改元素的值:

elecol.value=“20/07/2019”

您可以使用以下字符串处理其他日期字段:


“#selecPeriodoCuadro>div>div>input.form control.form control sm.fechaFin”

据我所见,更改下拉框中的日期不会更新页面中显示的表格,这意味着没有必要删除该表格

除非我遗漏了什么,否则下载excel文件并用vba处理它以获得所需的数据似乎更容易。因此,我将不讨论“如何更改输入框中的日期”问题,因为我发现这是徒劳的。相反,我将建议另一种方法

如果您使用浏览器的开发人员工具检查网络流量,您将看到,当您按下“Exportar cuadro”按钮时,将发送一个
GET
请求,该请求使用unix时间戳中的开始和结束日期作为参数,并返回相应的excel文件。您只需要
URL

下面是一个如何获取文件的示例:

Option Explicit

Sub Test()

Dim wb As Workbook
Dim url As String
Dim startDate As Double
Dim endDate As Double
startDate = ToUnix("10/08/2019") 'use whichever date you want
endDate = ToUnix("20/08/2019") 'use whichever date you want
url = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es&formatoXLS.x=1&fechaInicio=" & startDate & "&fechaFin=" & endDate
Set wb = Workbooks.Open(url)

End Sub

Public Function ToUnix(dt As Date) As Double 'credits to @Tim Williams
ToUnix = DateDiff("s", "1/1/1970", dt) * 1000
End Function
出于演示目的,上面的代码只会打开两个随机日期的报告。一旦工作簿存储在工作簿变量中,您就可以像往常一样对其进行操作,并对其执行任何需要的操作

您可以修改代码以满足您的需要

现在,已经说过了,我们提供了大量的文档和示例,您可以使用这些文档和示例快速可靠地获取所需的任何信息。我强烈建议你调查一下

另一方面,没有名为“ID”的HTML标记,因此:

IE.document.getElementsByTagName("ID")

应该返回
Nothing

查看@StavrosJon引用的API文档,您似乎可以执行以下操作。相关API终点为:

你可以得到一份工作。有关用法和限制的详细信息如下所示

API调用需要以逗号分隔的系列ID列表作为其参数之一。您可以硬编码这些,或者像我一样,只需从您引用的现有网页中获取这些,然后传入后续的API调用。我注册了必要的序列ID

响应是json-as detailed-ergo,因此您需要一个json解析器来处理响应。我使用jsonconverter.bas。从名为jsonConverter的标准模块下载原始代码并将其添加到该模块中。然后需要转到VBE>工具>引用>添加对Microsoft脚本运行时的引用

我使用了一些助手函数来确保输出的日期顺序正确,并且丢失的信息得到了适当的处理

如果需要项的配对(例如最大值/最小值),请在
titulo
列上对输出进行排序。否则,可以实现自定义排序


VBA:

Option Explicit
Public Sub GetData()
    '<  VBE > Tools > References > Microsoft Scripting Runtime
    Dim json As Object, re As Object, s As String, xhr As Object
    Dim startDate As String, endDate As String, ws As Worksheet, ids As String

    startDate = "2019-08-18"
    endDate = "2019-08-24"

    Dim datesDict As Object, headers(), results(), key As Variant, r As Long

    Set datesDict = GetDateDictionary(startDate, endDate)
    ReDim headers(1 To datesDict.Count + 2)
    headers(1) = "idSerie"
    headers(2) = "titulo"
    r = 3

    For Each key In datesDict.keys
        headers(r) = key
        r = r + 1
    Next

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set re = CreateObject("VBScript.RegExp")
    Set xhr = CreateObject("MSXML2.XMLHTTP")

    With xhr
        .Open "GET", "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es", False
        .send
        s = .responseText
        ids = GetIds(re, s)
        If ids = "No match" Then Exit Sub
        .Open "GET", "https://www.banxico.org.mx/SieAPIRest/service/v1/series/" & ids & "/datos/" & startDate & "/" & endDate & "", False 'https://www.banxico.org.mx/SieAPIRest/service/v1/doc/consultaDatosSerieRango
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "Bmx-Token", "aa833b22ee2a350192df6962b1eb6d8ea569ac895862ecc31b79b46859c7e74c" 'https://www.banxico.org.mx/SieAPIRest/service/v1/token  ''<== Replace with your generated token
        .send
        s = .responseText
    End With

    Set json = JsonConverter.ParseJson(s)("bmx")("series")

    ReDim results(1 To json.Count, 1 To UBound(headers))

    WriteOutResults ws, re, startDate, endDate, json, results, headers
End Sub

Public Sub WriteOutResults(ByVal ws As Worksheet, ByVal re As Object, ByVal startDate As String, ByVal endDate As String, ByVal json As Object, ByRef results(), ByRef headers())
    Dim item As Object, subItem As Object, key As Variant
    Dim r As Long, c As Long, datesDict As Object, nextKey As Variant
    re.Pattern = "\s{2,}"
    For Each item In json
        Set datesDict = GetDateDictionary(startDate, endDate)
        r = r + 1

        For Each key In item.keys
            Select Case key
            Case "idSerie"
                results(r, 1) = item(key)
            Case "titulo"
                results(r, 2) = re.Replace(item(key), Chr$(32))
            Case "datos"
                c = 3
                For Each subItem In item(key)
                    datesDict(subItem("fecha")) = subItem("dato")
                Next subItem
                For Each nextKey In datesDict.keys
                    results(r, c) = datesDict(nextKey)
                    c = c + 1
                Next
            End Select
        Next
    Next
    With ws
        .Cells(1, 1).Resize(1, UBound(headers)) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub

Public Function GetIds(ByVal re As Object, ByVal responseText As String) As String
    Dim matches As Object, i As Long, dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
    With re
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "'(SF\d{5})'"                 'regex pattern to get json string
        If .test(responseText) Then
            Set matches = .Execute(responseText)
            For i = 0 To matches.Count - 1
                dict(matches(i).SubMatches(0)) = vbNullString
            Next
            GetIds = Join$(dict.keys, ",")
        Else
            GetIds = "No match"
        End If
    End With
End Function

Public Function GetDateDictionary(ByVal startDate As String, ByVal endDate As String) As Object
    Dim sDate As Long, eDate As Long
    Dim dateDict As Object, i As Long

    Set dateDict = CreateObject("Scripting.Dictionary")
    sDate = CDate(startDate)
    eDate = CDate(endDate)
    For i = sDate To eDate
        dateDict(Format$(i, "dd/mm/yyyy")) = vbNullString
    Next
    Set GetDateDictionary = dateDict
End Function
选项显式
公共子GetData()
“工具>引用>Microsoft脚本运行时
Dim json作为对象,re作为对象,s作为字符串,xhr作为对象
Dim startDate为字符串,endDate为字符串,ws为工作表,ID为字符串
startDate=“2019-08-18”
endDate=“2019-08-24”
Dim datesDict作为对象,标题(),结果(),ke