Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/vba/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Excel VBA HTML网页拉取不匹配_Excel_Vba_Web Scraping - Fatal编程技术网

Excel VBA HTML网页拉取不匹配

Excel VBA HTML网页拉取不匹配,excel,vba,web-scraping,Excel,Vba,Web Scraping,我得到了一个代码块,用来输入易趣上的物品清单和价格。这似乎在很大程度上起到了作用,只是价格上存在一些不匹配(价格比房源多…)。有没有想过为什么会发生这种情况 Public IE As New SHDocVw.InternetExplorer Sub GetData() Dim HTMLdoc As MSHTml.HTMLDocument Dim othwb As Variant Dim objShellWindows As New SHDocVw.ShellWindows Set IE =

我得到了一个代码块,用来输入易趣上的物品清单和价格。这似乎在很大程度上起到了作用,只是价格上存在一些不匹配(价格比房源多…)。有没有想过为什么会发生这种情况

Public IE As New SHDocVw.InternetExplorer

Sub GetData()

Dim HTMLdoc As MSHTml.HTMLDocument
Dim othwb As Variant
Dim objShellWindows As New SHDocVw.ShellWindows

Set IE = CreateObject("internetexplorer.application")

    With IE
        .Visible = False
        .Navigate "https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=brooks+brothers&_sacat=1059&LH_TitleDesc=0&_osacat=1059&_odkw=brooks+brothers&LH_TitleDesc=0"
        While .Busy Or .ReadyState <> 4: DoEvents: Wend


            Set HTMLdoc = IE.Document
            ProcessHTMLPage HTMLdoc

        .Quit
    End With


End Sub

Sub ProcessHTMLPage(HTMLPage As MSHTml.HTMLDocument)

Dim HTMLItem As MSHTml.IHTMLElement
Dim HTMLItems As MSHTml.IHTMLElementCollection
Dim HTMLInput As MSHTml.IHTMLElement
Dim rownum As Long

rownum = 1

Set HTMLItems = HTMLPage.getElementsByClassName("s-item__title")

For Each HTMLItem In HTMLItems

        Cells(rownum, 1).Value = HTMLItem.innerText
        rownum = rownum + 1

Next HTMLItem

rownum = 1

Set HTMLItems = HTMLPage.getElementsByClassName("s-item__price")

For Each HTMLItem In HTMLItems

        Cells(rownum, 2).Value = HTMLItem.innerText
        rownum = rownum + 1

Next HTMLItem


End Sub
Public IE作为新的SHDocVw.InternetExplorer
子GetData()
将HTMLdoc设置为MSHTml.HTMLDocument
Dim-othwb作为变体
将objShellWindows变暗为新SHDocVw.ShellWindows
设置IE=CreateObject(“internetexplorer.application”)
与IE
.Visible=False
.导航“https://www.ebay.com/sch/i.html?_from=R40&_trksid=m570.l1313&_nkw=brooks+brothers&U sacat=1059&LH&U TitleDesc=0&U osacat=1059&U odkw=brooks+brothers&LH&U TitleDesc=0“
忙时或准备时状态4:DoEvents:Wend
设置HTMLdoc=IE.Document
ProcessHTMLPage HTMLdoc
退出
以
端接头
子进程HTMLPage(HTMLPage作为MSHTml.HTMLDocument)
将HTMLItem设置为MSHTml.IHTMLElement
将HTMLItems设置为MSHTml.IHTMLElementCollection
将HTMLInput设置为MSHTml.IHTMLElement
Dim rownum尽可能长
rownum=1
设置HTMLItems=HTMLPage.getElementsByClassName(“s-item\uu title”)
对于HTMLItem中的每个HTMLItem
单元格(rownum,1).Value=HTMLItem.innerText
rownum=rownum+1
下一个HTMLItem
rownum=1
设置HTMLItems=HTMLPage.getElementsByClassName(“s-项目价格”)
对于HTMLItem中的每个HTMLItem
单元格(rownum,2).Value=HTMLItem.innerText
rownum=rownum+1
下一个HTMLItem
端接头

首先,将选择器更改为仅限于listings主部分,以避免最近查看的项目。然后,您可以逐个处理列表。在下面的示例中,我将所有列出的价格(不包括删除线)捕获到一个数组中,并与集合中的相关标题一起存储。您可以
redim保留
数组维度,或者简单地提取lbound项以获得第一个价格。 价格

选项显式
公共子GetInfo()
Dim ie作为InternetExplorer,arr(),col
Set ie=新的InternetExplorer
Set col=新集合
与ie
.Visible=True
.导航“https://www.ebay.com/sch/i.html?_from=R40&_nkw=brooks+兄弟俩&mdascat=1059&mdash\u TitleDesc=0&mdash\u TitleDesc=0&rt=nc&mdascu-ipg=48&mdash\u-pgn=1“
当.Busy或.readyState<4:DoEvents:Wend时
Dim listedItems As Object,item As Object,price As Object,price As Object,j As Long
Set listedItems=.document.getElementById(“mainContent”).getElementsByCassName(“s项”)
对于listedItems中的每个项目
设置价格=item.getElementsByClassName(“s-项目价格”)
ReDim arr(0到prices.Length-1)'您可以通过ReDim将此限制为0到0
j=0
价格中的每一个价格
arr(j)=price.innerText
j=j+1
下一个
列添加数组(item.getElementsByClassName(“s-item_uutitle”)(0).innerText,arr)
下一个
退出
Dim item2作为变量,rowNum作为长
对于col中的每个项目2
rowNum=rowNum+1
使用此工作簿。工作表(“表1”)
.Cells(rowNum,1)=替换$(Trim$(item2(0)),Chr$(10),Chr$(32))
.Cells(rowNum,2).调整大小(1,UBound(item2(1))+1)=item2(1)
以
下一个
以
端接头

一些列出了出价的项目有>1个元素标记为
s-item\u price
-您需要过滤掉出价,只获取实际价格。这些数据去哪里了?我在Excel电子表格中没有看到从您的代码中提取的任何信息—它存储在集合中。如果在.quit之前插入单词stop,则可以在“局部变量”窗口中检查col变量的内容。然后,您可以循环将集合写入工作表。我已经更新了答案,显示了如何将集合清空到工作表。
Option Explicit    
Public Sub GetInfo()
    Dim ie As InternetExplorer, arr(), col
    Set ie = New InternetExplorer
    Set col = New Collection
    With ie
        .Visible = True
        .navigate "https://www.ebay.com/sch/i.html?_from=R40&_nkw=brooks+brothers&_sacat=1059&LH_TitleDesc=0&LH_TitleDesc=0&rt=nc&_ipg=48&_pgn=1"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim listedItems As Object, item As Object, prices As Object, price As Object, j As Long
        Set listedItems = .document.getElementById("mainContent").getElementsByClassName("s-item")
        For Each item In listedItems
            Set prices = item.getElementsByClassName("s-item__price")
            ReDim arr(0 To prices.Length - 1)    'you could limit this after by redim to 0 to 0
            j = 0
            For Each price In prices
                arr(j) = price.innerText
                j = j + 1
            Next
            col.Add Array(item.getElementsByClassName("s-item__title")(0).innerText, arr)
        Next
        .Quit

        Dim item2 As Variant, rowNum As Long
        For Each item2 In col
            rowNum = rowNum + 1
            With ThisWorkbook.Worksheets("Sheet1")
                .Cells(rowNum, 1) = Replace$(Trim$(item2(0)), Chr$(10), Chr$(32))
                .Cells(rowNum, 2).Resize(1, UBound(item2(1)) + 1) = item2(1)
            End With
        Next
    End With
End Sub