Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/89.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html Amazon DVD详细信息网络抓取无法选取所需元素_Html_Excel_Vba_Web Scraping - Fatal编程技术网

Html Amazon DVD详细信息网络抓取无法选取所需元素

Html Amazon DVD详细信息网络抓取无法选取所需元素,html,excel,vba,web-scraping,Html,Excel,Vba,Web Scraping,我传递特定电影的EAN编号,并在Amazon中提取电影名称和ASIN编号。 “” 但在亚马逊网站上,我面临的问题是,有时搜索结果也包含赞助产品的结果(它可能会出现,也可能不会出现),我想如何提取赞助产品以外的内容 因此,每当我debug.print亚马逊ASIN编号和电影名称时,它都会打印所有ASIN编号和电影名称(包括赞助产品) 为了识别赞助产品,我使用的方法是数据组件类型=“sp赞助结果” 在答复案文中 其中,由于实际产品在“data component type”all中不包含此id,因此

我传递特定电影的EAN编号,并在Amazon中提取电影名称和ASIN编号。 “”

但在亚马逊网站上,我面临的问题是,有时搜索结果也包含赞助产品的结果(它可能会出现,也可能不会出现),我想如何提取赞助产品以外的内容

因此,每当我debug.print亚马逊ASIN编号和电影名称时,它都会打印所有ASIN编号和电影名称(包括赞助产品)

为了识别赞助产品,我使用的方法是数据组件类型=“sp赞助结果” 在答复案文中

其中,由于实际产品在“data component type”all中不包含此id,因此我无法分离实际电影名称(赞助结果除外)

我尝试过如果不是xxxx的话,我的代码仍然打印在这里,我附加了我的代码

这是我的密码

Sub Amazon_Pull()
Dim Link_2 As String
 Link_2 = "https://www.amazon.de/s?k=7321925005738&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss"
 Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument


    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument

    With xhr
        .Open "GET", Link_2, False
        .send
         html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

'Debug.Print html.body.innerHTMLDebug.Print html.getElementsByTagName("div").getAttribute("data-index").Length

Dim hTable As Object
Dim hba As Object



Set hTable = html.getElementsByTagName("div")

For Each hba In hTable

  If Left(hba.getAttribute("data-asin"), 1) = "B" Then

     If hba.getElementsByTagName("div")(2).getAttribute("data-component-type") <> "sp-sponsored-result" Then
        Debug.Print hba.getAttribute("data-asin")
     End If

   End If

Next hba


 Set xhr = Nothing
 Set html = Nothing
'-------------
End Sub
Sub-Amazon_Pull()
Dim Link_2作为字符串
链接_2=”https://www.amazon.de/s?k=7321925005738&__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss"
Dim xhr为MSXML2.XMLHTTP60,html为MSHTML.HTMLDocument
设置xhr=New MSXML2.XMLHTTP60
Set html=New MSHTML.HTMLDocument
使用xhr
.打开“获取”,链接2,错误
.发送
html.body.innerHTML=StrConv(.responseBody,vbUnicode)
以
'Debug.Print html.body.innerHTMLDebug.Print html.getElementsByTagName(“div”).getAttribute(“数据索引”).Length
可调为对象
将hba设置为对象
Set hTable=html.getElementsByTagName(“div”)
对于hTable中的每个hba
如果左(hba.getAttribute(“数据asin”),1)=“B”,则
如果hba.getElementsByTagName(“div”)(2.getAttribute(“数据组件类型”)“sp赞助结果”,则
Debug.Print hba.getAttribute(“数据asin”)
如果结束
如果结束
下一个hba
设置xhr=Nothing
设置html=Nothing
'-------------
端接头

使用css属性=值选择器限制到适当的节点

Dim nodeList As Object, i As Long

Set nodelist = hba.querySelectorall("[data-asin]")

For i = 0 To nodeList.Length - 1
    Debug.Print nodeList.item(i).getAttribute("data-asin")
Next
您可以删除条件语句,并将所有条件逻辑添加到css选择器中,并使用字符B的“^”运算符开头

Dim nodeList As Object, i As Long

Set nodelist = hba.querySelectorall("[data-asin^=B]")

For i = 0 To nodeList.Length - 1
    Debug.Print nodeList.item(i).getAttribute("data-asin")
Next

你可以像这样做得又快又脏。但如果“Gesponsert”这个词是电影标题的一部分,它就失败了;-)

在我看来,最好使用页面的代码结构,而不是内容的一部分。我知道,这并不总是可能的,而且往往更复杂

要检查amazon上的报价是否得到赞助,可以使用如下页面代码的结构。一个优势是,它也可以在国际亚马逊平台上运行,而不考虑国家语言。 (未测试,因为亚马逊将我作为机器人屏蔽。)


除非确定确定确定赞助元素的确切HTML元素,否则我建议对每个div的InnerHTML属性进行黑客攻击,并搜索“sp赞助结果”(使用类似于operator或regexp的方法)。不是每个人都喜欢,但事后猜测外部Webdev是出了名的困难。我更倾向于假设,如果该文本字符串在DIV中,那么它就是赞助商。
Sub Amazon_Pull()

Dim Link_2 As String
Dim xhr As MSXML2.XMLHTTP60
Dim html As MSHTML.HTMLDocument
Dim hTable As Object
Dim hba As Object
Dim i As Long

  Link_2 = "https://www.amazon.de/s?k=7321925005738"
  Set xhr = New MSXML2.XMLHTTP60
  Set html = New MSHTML.HTMLDocument

  With xhr
    .Open "GET", Link_2, False
    .send
     html.body.innerHTML = StrConv(.responseBody, vbUnicode)
  End With

  Set hTable = html.querySelectorAll("div[data-index]")

  For i = 0 To hTable.Length - 1
    If InStr(1, hTable(i).innerText, "Gesponsert") = 0 Then
      Debug.Print hTable(i).getAttribute("data-asin") & " " & hTable(i).getElementsByTagName("h2")(0).innerText
    End If
  Next i

  Set xhr = Nothing
  Set html = Nothing
End Sub
Sub Amazon_Pull()

Dim Link_2 As String
Dim xhr As MSXML2.XMLHTTP60
Dim html As MSHTML.HTMLDocument
Dim hTable As Object
Dim hba As Object
Dim i As Long
Dim check As Long
Dim sponsored As Boolean
Dim checkSponsored As Object

  Link_2 = "https://www.amazon.de/s?k=7321925005738"
  'Link_2 = "https://www.amazon.de/s?k=apple"
  Set xhr = New MSXML2.XMLHTTP60
  Set html = New MSHTML.HTMLDocument

  With xhr
    .Open "GET", Link_2, False
    .send
     html.body.innerHTML = StrConv(.responseBody, vbUnicode)
  End With

  Set hTable = html.querySelectorAll("div[data-index]")

  For i = 0 To hTable.Length - 1
    sponsored = False
    Set checkSponsored = hTable(i).querySelectorAll("div[data-component-type]")

    For check = 0 To checkSponsored.Length - 1
      If checkSponsored.getAttribute("data-component-type") = "sp-sponsored-result" Then
        sponsored = True
      End If
    Next check

    If Not sponsored Then
      Debug.Print hTable(i).getAttribute("data-asin") & " " & hTable(i).getElementsByTagName("h2")(0).innerText
    End If
  Next i

  Set xhr = Nothing
  Set html = Nothing
End Sub