VBA web scraper，用于从JavaScript表单访问搜索结果_Javascript_Excel_Vba_Web Scraping

VBA web scraper，用于从JavaScript表单访问搜索结果

javascript excel vba web-scraping

VBA web scraper，用于从JavaScript表单访问搜索结果,javascript,excel,vba,web-scraping,Javascript,Excel,Vba,Web Scraping,我正在VBA中开发一个web刮板我有一个带有JavaScript表单的网站，我不知道如何从该JavaScript表单的搜索结果中访问表我知道如何从一个普通的HTML站点导航和提取所需的信息。我已经把我的搜索关键字，并点击“搜索”按钮，所有通过VBA 搜索后（例如“SN857X00PE”），搜索结果显示在表格中： SN857X00PE/01 StudioLine 9702 - 9709 SN857X00PE/38 StudioLine 9711 - 9801 SN857X00PE/

我正在VBA中开发一个web刮板

我有一个带有JavaScript表单的网站，我不知道如何从该JavaScript表单的搜索结果中访问表

我知道如何从一个普通的HTML站点导航和提取所需的信息。我已经把我的搜索关键字，并点击“搜索”按钮，所有通过VBA

搜索后（例如“SN857X00PE”），搜索结果显示在表格中：

SN857X00PE/01   StudioLine  9702 - 9709
SN857X00PE/38   StudioLine  9711 - 9801
SN857X00PE/42   StudioLine  9802 - 9804
SN857X00PE/46   StudioLine  9805 - 9806

我希望访问所有左侧序列号（例如，

SN857X00PE/01

，

SN857X00PE/38

等）

当我进入Firefox调试器时，我看到许多

.js

文件和

索引.xhtml

。我可以在

index.xhtml

文件中找到我需要的所有东西（下面的代码包括示例搜索

SN857X00PE

），但是当我通过

IE.Document.getElementById（“body”）.InnerHtml

访问HTML时，它不会显示

index.xhtml

文件的内容，而是显示

TP4.js

文件的内容（代码如下）

正如您在下面看到的，

TP4.js

文件不包含任何关于搜索结果的有用信息或任何访问它们的方法

有没有办法访问JavaScript表单的搜索结果表？如果我能够在其中输入关键字并将搜索结果放在VBA中，那么也应该可以访问结果。我希望

IE.Documents

指向

index.xhtml

文件的内容，而不是默认的

TP4.js

文件，如果这是可能的

当我去Firefox inspector查看javascript网站最终生成的HTML时，访问我的信息看起来非常容易。有没有办法在浏览器编译完所有javascript后直接访问漂亮干净的“endresult HTML”

该网站是tradeplace（DOT）com，但Javascript表单隐藏在登录后

这里是一个关于网站显示搜索结果时的外观的概述。在右边，我显示了index.xhtml中包含搜索结果的表格，我正试图访问该表格，但我不知道如何访问它，因为我只能访问tp4.js文件的内容

由于最大字符限制，我无法包含整个HTML代码，以下是一些我认为重要/相关的部分，因此您可以大致了解网站的外观：

index.xhtml：


贸易广场市场
if（窗口面）{
PrimeFaces.settings.locale='en_US'；
}
$（文档）.ready（函数（）{
TP4.initPageLayout（）；
TP4.启用子菜单（）；
});
$（窗口）。卸载（函数（）{
TP4.hideLoadingScreen（）；
});
-------------------------------------------------------------------------…将某些代码跳过到相关的代码部分-------------------------------------------------------------------------
斯迪奥林
9702 - 9709
--------------------------------------------------------------再次跳过代码直到结束-------------------------------------------------------------
$（函数（）{
cw（“AjaxStatus”、“ajaxStatusWidget”{
id:“ajaxStatus”，
开始：函数（）{
TP4.显示加载屏幕（）；
},
成功：函数（）{
TP4.hideLoadingScreen（）；
}
});
});
被咬的瓦滕-达滕伯沙芬
$（函数（）{
cw（“对话框”，“加载对话框小部件”{
id:“加载对话框”，
可调整大小：false，
莫代尔：对
});
});
（功能（i、s、o、g、r、a、m）{
i['GoogleAnalyticsObject']=r；
i[r]=i[r]| |函数（）{
（i[r].q=i[r].q | |[]）.push（参数）
}，i[r].l=1*新日期（）；
a=s.createElement（o），
m=s.getElementsByTagName（o）[0]；
a、 异步=1；
a、 src=g；
m、 parentNode.insertBefore（a，m）
})（窗口，文档，“脚本”，“www.google-analytics.com/analytics.js”，“ga”）；
ga（“创建”、“UA-55961901-1”、“自动”）；
ga（‘集合’、‘uid’、‘27d62c9d4ec32f32a829bed7142036c05d9516ac93c8935d18acf1fdc3d59145’）；
ga（'send'、'pageview'{
“标题”：“ProductSearchResult”
});

CSS选择器：

Option Explicit
Public Sub HTMLQuery()
    Dim oXHTTP As Object, HTML As New HTMLDocument, aNodeList As Object, i As Long
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    With oXHTTP
        .Open "GET", "C:\Users\User\Desktop\index.html", False
        .send
        HTML.body.innerHTML = oXHTTP.responseText
        Set aNodeList = HTML.querySelectorAll("#productList td > a")

        For i = 0 To aNodeList.Length - 1
            Debug.Print aNodeList.item(i).innerText
        Next i
   End With
End Sub

Dim aNodeList As Object
Set aNodeList = IE.document.querySelectorAll("#productList td > a")

For i = 0 To aNodeList.Length - 1
    Debug.Print aNodeList.item(i).innerText
Next i

因此，使用提供的HTML，我可以使用CSS选择器，如下所示：

#productList td > a

这将根据其样式选择元素。

“#”

代表类。

td>a

意味着，选择父元素为

td

元素的所有

元素。因此，选择类为

productList

的元素内部的元素

我使用

document

的

querySelectorAll

方法应用选择器，该方法返回匹配元素的

nodeList

，然后遍历该元素的长度

我正在从文件中读取您的HTML，但您可以使用：

ie.document.querySelectorAll("#productList td > a")

CSS查询：

输出到即时窗口：

Option Explicit
Public Sub HTMLQuery()
    Dim oXHTTP As Object, HTML As New HTMLDocument, aNodeList As Object, i As Long
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    With oXHTTP
        .Open "GET", "C:\Users\User\Desktop\index.html", False
        .send
        HTML.body.innerHTML = oXHTTP.responseText
        Set aNodeList = HTML.querySelectorAll("#productList td > a")

        For i = 0 To aNodeList.Length - 1
            Debug.Print aNodeList.item(i).innerText
        Next i
   End With
End Sub

Dim aNodeList As Object
Set aNodeList = IE.document.querySelectorAll("#productList td > a")

For i = 0 To aNodeList.Length - 1
    Debug.Print aNodeList.item(i).innerText
Next i

VBA:

Option Explicit
Public Sub HTMLQuery()
    Dim oXHTTP As Object, HTML As New HTMLDocument, aNodeList As Object, i As Long
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    With oXHTTP
        .Open "GET", "C:\Users\User\Desktop\index.html", False
        .send
        HTML.body.innerHTML = oXHTTP.responseText
        Set aNodeList = HTML.querySelectorAll("#productList td > a")

        For i = 0 To aNodeList.Length - 1
            Debug.Print aNodeList.item(i).innerText
        Next i
   End With
End Sub

Dim aNodeList As Object
Set aNodeList = IE.document.querySelectorAll("#productList td > a")

For i = 0 To aNodeList.Length - 1
    Debug.Print aNodeList.item(i).innerText
Next i

对于您的代码：

Option Explicit
Public Sub HTMLQuery()
    Dim oXHTTP As Object, HTML As New HTMLDocument, aNodeList As Object, i As Long
    Set oXHTTP = CreateObject("MSXML2.XMLHTTP")
    With oXHTTP
        .Open "GET", "C:\Users\User\Desktop\index.html", False
        .send
        HTML.body.innerHTML = oXHTTP.responseText
        Set aNodeList = HTML.querySelectorAll("#productList td > a")

        For i = 0 To aNodeList.Length - 1
            Debug.Print aNodeList.item(i).innerText
        Next i
   End With
End Sub

Dim aNodeList As Object
Set aNodeList = IE.document.querySelectorAll("#productList td > a")

For i = 0 To aNodeList.Length - 1
    Debug.Print aNodeList.item(i).innerText
Next i

使用计时器循环以加载页面：

While IE.Busy Or IE.readyState < 4: DoEvents: Wend
Dim t As Date
t = Timer

Do
    DoEvents
    On Error Resume Next
    Set aNodeList = IE.document.querySelectorAll("#productList td > a")
    On Error GoTo 0
    If Timer - t = 3 Then Exit Do        '<==To avoid infinite loop. Adjust 3 seconds as required
Loop While aNodeList Is Nothing

If Not aNodeList Is Nothing Then
    For i = 0 To aNodeList.Length - 1
        Debug.Print aNodeList.item(i).innerText
    Next i
End If

当IE.Busy或IE.readyState<4:DoEvents:Wend时日期 t=计时器做多芬特出错时继续下一步设置aNodeList=IE.document.querySelectorAll（“#productList td>a”）错误转到0 如果Timer-t=3，则退出Do'Tools>References，如果使用HTMLD，则添加对HTML对象库的引用