Javascript 使用MSXML2.XMLHTTP提取包装器值
我们目前正在使用MSXML2.XMLHTTP从网页中提取数据。使用我的代码,除Javascript 使用MSXML2.XMLHTTP提取包装器值,javascript,html,vba,web-scraping,data-extraction,Javascript,Html,Vba,Web Scraping,Data Extraction,我们目前正在使用MSXML2.XMLHTTP从网页中提取数据。使用我的代码,除rvw cnt tx类数据外,所有数据都已提取。我想从以下url中提取43个审阅值 url=”https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759?boutiqueId=555784&merchantId=4171" 网页html: 它是动态检索的。但是,您可以将/yorumlar连接到当前url的末尾,以进入“评论”页面,
rvw cnt tx
类数据外,所有数据都已提取。我想从以下url中提取43个审阅值
url=”https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759?boutiqueId=555784&merchantId=4171"
网页html:
它是动态检索的。但是,您可以将/yorumlar
连接到当前url的末尾,以进入“评论”页面,并且在该页面中静态显示该值。我使用正则表达式来提取文本中出现评论数量的部分
Option Explicit
Public Sub GetReviewCount()
'tools > references > Microsoft HTML Object Library
Dim re As Object, html As MSHTML.HTMLDocument, xhr As Object
Set re = CreateObject("VBScript.RegExp")
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
re.Pattern = "([0-9,]+)"
With xhr
.Open "GET", "https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759/yorumlar", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerhtml = .responseText
End With
Debug.Print re.Execute(html.querySelector(".title h3").innerText)(0).SubMatches(0)
End Sub
这个html.querySelector(“.title h3”)
限制正则表达式只搜索存在该值的节点中的字符串
Option Explicit
Public Sub GetReviewCount()
'tools > references > Microsoft HTML Object Library
Dim re As Object, html As MSHTML.HTMLDocument, xhr As Object
Set re = CreateObject("VBScript.RegExp")
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
re.Pattern = "([0-9,]+)"
With xhr
.Open "GET", "https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759/yorumlar", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerhtml = .responseText
End With
Debug.Print re.Execute(html.querySelector(".title h3").innerText)(0).SubMatches(0)
End Sub
要正确获取cat变量,请执行以下操作:
Option Explicit
Public Sub GetCat()
'tools > references > Microsoft HTML Object Library
Dim html As MSHTML.HTMLDocument, xhr As Object
Set xhr = CreateObject("MSXML2.XMLHTTP")
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", "https://www.trendyol.com/lc-waikiki/erkek-cocuk-lacivert-takim-p-78215759?boutiqueId=555784&merchantId=4171", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerhtml = .responseText
End With
Dim nodes As Object, cat As String, i As Long
Set nodes = html.querySelectorAll(".breadcrumb .breadcrumb-item")
For i = 0 To nodes.Length - 1
cat = cat & IIf(i = nodes.Length - 1, nodes.Item(i).innerText, nodes.Item(i).innerText & " > ")
Next
Debug.Print cat
End Sub
谢谢,但计数也可以在页面中间有下面的类pr-rnr-sm-p-shtml是以下42评论29评论嗨,你想要的评论数量。我已返回正确数量的评论。我特别选择了我选择的源字符串,因为它是一个更短的字符串,搜索效率更高。感谢澄清,我使用以下代码进行评级提取,但代码没有显示十进制值,即只显示4而不是4.6rating=re.Execute(html.getElementsByClassName(“pr-rnr-sm-p”)(0.innerText)(0)