Html 使用vba从网站中抓取数据
我试图从网站上抓取数据:通过vba,像实时价格,即德国5年期Bobl,美国30年期国债,我尝试过excel web query,但它只抓取整个网站,但我只想抓取费率,有没有办法做到这一点?有几种方法可以做到这一点。这是我写的一个答案,希望在浏览关键词“从网站上抓取数据”时能找到Internet Explorer自动化的所有基础知识,但请记住,没有什么比你自己的研究更有价值(如果你不想坚持你无法定制的预写代码) 请注意,这是单向的,我不喜欢它的性能(因为它取决于浏览器速度),但这有助于理解互联网自动化背后的原理 1) 如果我需要浏览网页,我需要一个浏览器!因此,我创建了一个Internet Explorer浏览器:Html 使用vba从网站中抓取数据,html,vba,excel,web-scraping,Html,Vba,Excel,Web Scraping,我试图从网站上抓取数据:通过vba,像实时价格,即德国5年期Bobl,美国30年期国债,我尝试过excel web query,但它只抓取整个网站,但我只想抓取费率,有没有办法做到这一点?有几种方法可以做到这一点。这是我写的一个答案,希望在浏览关键词“从网站上抓取数据”时能找到Internet Explorer自动化的所有基础知识,但请记住,没有什么比你自己的研究更有价值(如果你不想坚持你无法定制的预写代码) 请注意,这是单向的,我不喜欢它的性能(因为它取决于浏览器速度),但这有助于理解互联网自
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
2) 我要求浏览器浏览目标网页。通过使用属性“.Visible”,我决定是否希望看到浏览器执行其工作。当构建代码时,拥有Visible=True
很好,但是当代码用于抓取数据时,没有每次都看到它很好,所以Visible=False
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures"
.Visible = True
End With
3) 该网页将需要一些时间来加载。所以,我会在忙碌的时候等待
Do While appIE.Busy
DoEvents
Loop
4) 好了,现在页面已加载。比如说,我想把US30Y T-Bond的零钱凑起来:
我要做的只是在Internet Explorer上单击F12以查看网页的代码,然后使用指针(在红色圆圈中)单击我要刮取的元素,以查看如何达到我的目的
5) 我应该做的是直截了当的。首先,我将通过ID属性获取包含以下值的tr
元素:
Set allRowOfData = appIE.document.getElementById("pair_8907")
这里我将获得td
元素的集合(具体来说,tr
是一行数据,td
是它的单元格。我们正在寻找第8个元素,因此我将写:
Dim myValue As String: myValue = allRowOfData.Cells(7).innerHTML
为什么我写7而不是8?因为单元格集合从0开始,所以第8个元素的索引是7(8-1)。简要分析这行代码:
使我能够访问.Cells()
元素td
是包含我们要查找的值的单元格的属性innerHTML
myValue
变量中),我们就可以关闭IE浏览器并通过将其设置为“无”释放内存:
appIE.Quit
Set appIE = Nothing
好了,现在你有了你的值,你可以用它做任何你想做的事情:把它放进一个单元格(Range(“A1”).value=myValue
),或者放进一个表单的标签(Me.label1.Text=myValue
)
我想指出的是,StackOverflow不是这样工作的:在这里,你会发布关于特定编码问题的问题,但你应该首先进行自己的搜索。我回答一个没有显示太多研究工作的问题的原因是,我看到它被问了好几次,回到我学会如何做的时候这一点,我记得我希望有更好的支持开始。所以我希望这个答案,这只是一个“研究投入”并不是最好的/最完整的解决方案,可以为下一个遇到同样问题的用户提供支持。因为我已经学会了如何编程,这要感谢这个社区,我想你和其他初学者可能会利用我的输入来发现编程的美丽世界
享受您的实践;)您可以使用winhttprequest对象而不是internet explorer,因为最好加载不包含图片n广告的数据,而不是下载包含广告n图片的完整网页。与winhttprequest对象相比,这些图片使internet explorer对象更重 这个问题很久以前就被问到了。但我认为以下信息对新手很有用。实际上,您可以像这样轻松地从类名中获取值
Sub ExtractLastValue()
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("https://uk.investing.com/rates-bonds/financial-futures/")
Do
DoEvents
Loop Until objIE.readystate = 4
MsgBox objIE.document.getElementsByClassName("pid-8907-last")(0).innerText
End Sub
如果你不熟悉网络抓取,请阅读这篇博文
还有各种从网页中提取数据的技术。本文用例子来解释其中的几个
我修改了一些为我弹出错误的东西,最终得到了这样一个结果,它非常适合根据我的需要提取数据:
Sub get_data_web()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.navigate "https://finance.yahoo.com/quote/NQ%3DF/futures?p=NQ%3DF"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set allRowofData = appIE.document.getElementsByClassName("Ta(end) BdT Bdc($c-fuji-grey-c) H(36px)")
Dim i As Long
Dim myValue As String
Count = 1
For Each itm In allRowofData
For i = 0 To 4
myValue = itm.Cells(i).innerText
ActiveSheet.Cells(Count, i + 1).Value = myValue
Next
Count = Count + 1
Next
appIE.Quit
Set appIE = Nothing
End Sub
还提到了其他方法,因此,请允许我们承认,在撰写本文时,我们正处于21世纪。让我们打开本地总线浏览器,带着请求飞行(简称XHR GET) XHR是对象形式的API,其方法传输数据 在web浏览器和web服务器之间。对象由 浏览器的JavaScript环境 这是一种快速检索数据的方法,无需打开浏览器。可以将服务器响应读入HTMLDocument,然后从那里继续抓取表的过程 请注意,不会检索javascript呈现/动态添加的内容,因为没有运行javascript引擎(浏览器中有) 在下面的代码中,表格由其id
cr1
抓取
在helper子文件WriteTable
中,我们循环列(td
标记),然后循环表行(tr
标记),最后遍历每个表行的长度,表单元格逐个表单元格。由于我们只需要第1列和第8列中的数据,因此使用Select Case
语句指定写入工作表的内容
示例网页视图:
Option Explicit
Public Sub GetRates()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://uk.investing.com/rates-bonds/financial-futures", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hTable = html.getElementById("cr1")
WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
Application.ScreenUpdating = True
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
Select Case columnCounter
Case 2
.Cells(startRow, 1) = header.innerText
Case 8
.Cells(startRow, 2) = header.innerText
End Select
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody
Set tRow = tSection.getElementsByTagName("tr")
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell
Select Case C
Case 2
.Cells(r, 1).Value = td.innerText
Case 8
.Cells(r, 2).Value = td.innerText
End Select
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
示例代码输出:
Option Explicit
Public Sub GetRates()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://uk.investing.com/rates-bonds/financial-futures", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hTable = html.getElementById("cr1")
WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
Application.ScreenUpdating = True
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
Select Case columnCounter
Case 2
.Cells(startRow, 1) = header.innerText
Case 8
.Cells(startRow, 2) = header.innerText
End Select
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody
Set tRow = tSection.getElementsByTagName("tr")
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell
Select Case C
Case 2
.Cells(r, 1).Value = td.innerText
Case 8
.Cells(r, 2).Value = td.innerText
End Select
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
VBA:
Option Explicit
Public Sub GetRates()
Dim html As HTMLDocument, hTable As HTMLTable '<== Tools > References > Microsoft HTML Object Library
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://uk.investing.com/rates-bonds/financial-futures", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Application.ScreenUpdating = False
Set hTable = html.getElementById("cr1")
WriteTable hTable, 1, ThisWorkbook.Worksheets("Sheet1")
Application.ScreenUpdating = True
End Sub
Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, C As Long, tBody As Object
r = startRow: If ws Is Nothing Then Set ws = ActiveSheet
With ws
Dim headers As Object, header As Object, columnCounter As Long
Set headers = hTable.getElementsByTagName("th")
For Each header In headers
columnCounter = columnCounter + 1
Select Case columnCounter
Case 2
.Cells(startRow, 1) = header.innerText
Case 8
.Cells(startRow, 2) = header.innerText
End Select
Next header
startRow = startRow + 1
Set tBody = hTable.getElementsByTagName("tbody")
For Each tSection In tBody
Set tRow = tSection.getElementsByTagName("tr")
For Each tr In tRow
r = r + 1
Set tCell = tr.getElementsByTagName("td")
C = 1
For Each td In tCell
Select Case C
Case 2
.Cells(r, 1).Value = td.innerText
Case 8
.Cells(r, 2).Value = td.innerText
End Select
C = C + 1
Next td
Next tr
Next tSection
End With
End Sub
选项显式
公共次级利率()
将html设置为HTMLDocument,将HTTable设置为HTMLTable'引用>Microsoft html对象库
设置html=新的HTMLDocument
使用CreateObject(“MSXML2.XMLHTTP”)
.打开“获取”https://uk