Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/74.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html 使用css选择器excel vba从网站中抓取数据_Html_Excel_Vba_Web Scraping_Css Selectors - Fatal编程技术网

Html 使用css选择器excel vba从网站中抓取数据

Html 使用css选择器excel vba从网站中抓取数据,html,excel,vba,web-scraping,css-selectors,Html,Excel,Vba,Web Scraping,Css Selectors,我正试图用CSS选择器从网站上刮取特定数据。在QHar的帮助下,我成功了,但是现在的要求已经改变了。下面是我的代码: 代码 Public Sub CompanyData2() Dim html As HTMLDocument, ws As Worksheet, re As Object Set re = CreateObject("VBScript.RegExp") re.Pattern = "\s{2,}" Set ws = ThisWorkbook.Worksheets("Sheet1"

我正试图用CSS选择器从网站上刮取特定数据。在QHar的帮助下,我成功了,但是现在的要求已经改变了。下面是我的代码:

代码

Public Sub CompanyData2()

Dim html As HTMLDocument, ws As Worksheet, re As Object

Set re = CreateObject("VBScript.RegExp")
re.Pattern = "\s{2,}"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.bizi.si/iskanje?q=", False
    .send
    html.body.innerHTML = .responseText
End With

ws.Range("A4").Value = re.Replace(Join$(Array(html.querySelector("td.item a").innerText), ", "), Chr$(32))
ws.Range("A5").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item").innerText), ", "), Chr$(32))
ws.Range("B6").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item + td.item + td.item").innerText), ", "), Chr$(32))

End Sub
结果如下:

网站

Public Sub CompanyData2()

Dim html As HTMLDocument, ws As Worksheet, re As Object

Set re = CreateObject("VBScript.RegExp")
re.Pattern = "\s{2,}"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.bizi.si/iskanje?q=", False
    .send
    html.body.innerHTML = .responseText
End With

ws.Range("A4").Value = re.Replace(Join$(Array(html.querySelector("td.item a").innerText), ", "), Chr$(32))
ws.Range("A5").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item").innerText), ", "), Chr$(32))
ws.Range("B6").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item + td.item + td.item").innerText), ", "), Chr$(32))

End Sub

我想在表1 A3上提取公司名称,如下所示:


谢谢。

您需要A1中的
REPROMAT
,然后在发出初始查询后,您必须访问实际的公司页面以获取显示的公司名称。如果您直接使用公司url,则可以跳过第一个请求,并从第二个请求开始使用代码

Public Sub CompanyData()
    Dim html As HTMLDocument, ws As Worksheet, nodes As Object

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.bizi.si/iskanje?q=" & Application.EncodeURL(ws.Range("A1").Value), False
        .send
        html.body.innerHTML = .responseText

        Set nodes = html.querySelectorAll("td.item")

        With ws
            .Range("A4").Value = nodes.Item(0).FirstChild.innerText
            .Range("A5").Value = nodes.Item(1).innerText
            .Range("A6").Value = "DŠ: " & nodes.Item(3).innerText
        End With

        .Open "GET", html.querySelector("[id$=linkCompany]").href, False
        .send
        html.body.innerHTML = .responseText
        ws.Range("A3") = html.querySelector("#ctl00_ctl00_cphMain_cphMainCol_CompanySPLPreview1_labTitlePRS").innerText
    End With
End Sub

请使用代码片段工具via来共享html,我们可以使用它来测试您传递给url结尾的值是多少,以获得最终输出,如图所示?我们能否至少有两个示例输入和预期输出,因为在我的测试中html可能会有所不同,我可以找到并返回REPROMAT d.o.o.,但不能返回您显示的全名。好的。我算出了实际情况,请尝试编辑后的答案。我还有一个问题要问QHarr。它现在运行良好,但当我在网站BIZI中更换公司时,我在excel中得到与以前搜索相同的结果。我必须关闭并打开excel以提取不同的公司数据。谢谢。嗨,你介意打开一个新的问题,解释这个问题和你尝试了什么吗?如果愿意,您可以在此处删除指向它的链接。您可以尝试在.Open行之后添加.setRequestHeader“if Modified from”,“Sat,2000年1月1日00:00:00 GMT”,然后查看这是否会首先更改内容。我给你发送链接。谢谢。你试过使用我提到的requestheader吗?