Javascript 使用MSXML2.XMLHTTP提取包装器值

Javascript 使用MSXML2.XMLHTTP提取包装器值,javascript,html,vba,web-scraping,data-extraction,Javascript,Html,Vba,Web Scraping,Data Extraction,我们目前正在使用MSXML2.XMLHTTP从网页中提取数据。使用我的代码,除rvw cnt tx类数据外,所有数据都已提取。我想从以下url中提取43个审阅值 url=”https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759?boutiqueId=555784&merchantId=4171" 网页html: 它是动态检索的。但是,您可以将/yorumlar连接到当前url的末尾,以进入“评论”页面,

我们目前正在使用MSXML2.XMLHTTP从网页中提取数据。使用我的代码,除
rvw cnt tx
类数据外,所有数据都已提取。我想从以下url中提取43个审阅值

url=”https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759?boutiqueId=555784&merchantId=4171"

网页html:


它是动态检索的。但是,您可以将
/yorumlar
连接到当前url的末尾,以进入“评论”页面,并且在该页面中静态显示该值。我使用正则表达式来提取文本中出现评论数量的部分

Option Explicit

Public Sub GetReviewCount()
    'tools > references > Microsoft HTML Object Library
    Dim re As Object, html As MSHTML.HTMLDocument,  xhr As Object

    Set re = CreateObject("VBScript.RegExp")
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    Set html = New MSHTML.HTMLDocument
    re.Pattern = "([0-9,]+)"
    
    With xhr
        .Open "GET", "https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759/yorumlar", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerhtml = .responseText
    End With
    Debug.Print re.Execute(html.querySelector(".title h3").innerText)(0).SubMatches(0)
End Sub
这个
html.querySelector(“.title h3”)
限制正则表达式只搜索存在该值的节点中的字符串

Option Explicit

Public Sub GetReviewCount()
    'tools > references > Microsoft HTML Object Library
    Dim re As Object, html As MSHTML.HTMLDocument,  xhr As Object

    Set re = CreateObject("VBScript.RegExp")
    Set xhr = CreateObject("MSXML2.XMLHTTP")
    Set html = New MSHTML.HTMLDocument
    re.Pattern = "([0-9,]+)"
    
    With xhr
        .Open "GET", "https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759/yorumlar", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerhtml = .responseText
    End With
    Debug.Print re.Execute(html.querySelector(".title h3").innerText)(0).SubMatches(0)
End Sub

要正确获取cat变量,请执行以下操作:

Option Explicit

Public Sub GetCat()
    'tools > references > Microsoft HTML Object Library
    Dim html As MSHTML.HTMLDocument, xhr As Object

    Set xhr = CreateObject("MSXML2.XMLHTTP")
    Set html = New MSHTML.HTMLDocument

    With xhr
        .Open "GET", "https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759?boutiqueId=555784&merchantId=4171", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerhtml = .responseText
    End With
    
    Dim nodes As Object, cat As String, i As Long
    
    Set nodes = html.querySelectorAll(".breadcrumb .breadcrumb-item")
    For i = 0 To nodes.Length - 1
        cat = cat & IIf(i = nodes.Length - 1, nodes.Item(i).innerText, nodes.Item(i).innerText & " > ")
    Next
    Debug.Print cat
End Sub

谢谢,但计数也可以在页面中间有下面的类pr-rnr-sm-p-shtml是以下42评论29评论嗨,你想要的评论数量。我已返回正确数量的评论。我特别选择了我选择的源字符串,因为它是一个更短的字符串,搜索效率更高。感谢澄清,我使用以下代码进行评级提取,但代码没有显示十进制值,即只显示4而不是4.6rating=re.Execute(html.getElementsByClassName(“pr-rnr-sm-p”)(0.innerText)(0)