Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/24.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Excel Web抓取ETF每日数据VBA_Excel_Vba_Web Scraping_Yahoo Finance - Fatal编程技术网

Excel Web抓取ETF每日数据VBA

Excel Web抓取ETF每日数据VBA,excel,vba,web-scraping,yahoo-finance,Excel,Vba,Web Scraping,Yahoo Finance,我正在网上搜集不同ETF的每日信息。我发现他们有准确的信息。 最相关的信息是ETF的开盘价、流通股、资产净值和总资产。 以下是IVV美国股权的链接: 我以前用VBA浏览过网页,但我使用过的网页的HTML是不同的,我不知道这是否是因为ETF的某些值(如价格和交易量)不断变化。 其想法是创建一个代码来提取相关信息,并创建一个数据库来分析宏观经济因素,使用ETF作为国家、地区等之间流动的市场指标 Mi的第一种方法是使用VBA,但在我深入了解数据之后,我想尝试使用Python(在我对它更加熟悉之后),以

我正在网上搜集不同ETF的每日信息。我发现他们有准确的信息。 最相关的信息是ETF的开盘价、流通股、资产净值和总资产。 以下是IVV美国股权的链接:

我以前用VBA浏览过网页,但我使用过的网页的HTML是不同的,我不知道这是否是因为ETF的某些值(如价格和交易量)不断变化。 其想法是创建一个代码来提取相关信息,并创建一个数据库来分析宏观经济因素,使用ETF作为国家、地区等之间流动的市场指标

Mi的第一种方法是使用VBA,但在我深入了解数据之后,我想尝试使用Python(在我对它更加熟悉之后),以每天自动化Web垃圾处理过程

我愿意接受任何可能有用的建议或任何其他网站(我曾尝试过雅虎财经和晨星,我在HTML代码方面遇到了同样的问题)

这是我糟糕的代码:

Sub Get_Data()
    
    Dim ticker As String, enlace As String
    
    ticker = ThisWorkbook.Worksheets("ETFs").Cells(2, 2).Value 'IVV
    'link = "https://www.morningstar.com/etfs/arcx/" & ticker & "/quote.html"
    'link = "https://finance.yahoo.com/quote/" & ticker & "?p=" & ticker
    link = "https://www.marketwatch.com/investing/fund/" & ticker
        
    Application.ScreenUpdating = False
        
    Dim x As Integer
    x = ThisWorkbook.Worksheets("ETFs").Cells(Rows.Count, 1).End(xlUp).Row
    
    'Dim i As Integer
    'For i = 2 To x
    
    Dim total_net_assets As Variant, open_price As Variant, NAV As Variant, shares_out
            
    Set ie = CreateObject("InternetExplorer.application")
    With ie
        .Visible = False
        .navigate link
        While .Busy Or .readyState < 4: DoEvents: Wend
            Do
                DoEvents
                On Error Resume Next
                ' Here is where I get the problem of not knowing how to reference the values I need because the class name appears repeatedly
                total_net_assets = .document.getElementsByClassName("").Value
                open_price = .document.getElementByClassName("price").Value
                NAV = .document.getElementByClassName("").Value
                shares_out = .document.getElementByClassName("kv__value kv__primary ").Value
                On Error GoTo 0
            Loop
    End With
    ThisWorkbook.Worksheets("ETFs").Cells(2, 13).Value = total_net_assets
    ThisWorkbook.Worksheets("ETFs").Cells(2, 14).Value = NAV
    ThisWorkbook.Worksheets("ETFs").Cells(2, 15).Value = open_price
    ThisWorkbook.Worksheets("ETFs").Cells(2, 16).Value = shares_out
    ie.Quit
    'Next i
    Application.ScreenUpdating = True

End Sub
Sub Get_Data()
变暗标记为字符串,放大标记为字符串
ticker=此工作簿。工作表(“ETF”)。单元格(2,2)。值“IVV”
'链接='https://www.morningstar.com/etfs/arcx/“&ticker&/quote.html”
'链接='https://finance.yahoo.com/quote/“&ticker&”?p=“&ticker
链接=”https://www.marketwatch.com/investing/fund/“&报价器
Application.ScreenUpdating=False
作为整数的Dim x
x=此工作簿。工作表(“ETF”)。单元格(Rows.Count,1)。结束(xlUp)。行
'作为整数的Dim i
'对于i=2到x
Dim总净资产作为变量,未平仓价格作为变量,资产净值作为变量,份额作为变量
设置ie=CreateObject(“InternetExplorer.application”)
与ie
.Visible=False
.导航链接
当.Busy或.readyState<4:DoEvents:Wend时
做
多芬特
出错时继续下一步
'这里是我遇到的问题,因为类名重复出现,我不知道如何引用所需的值
总净资产=.document.getElementsByClassName(“”).Value
open_price=.document.getElementByClassName(“price”).Value
NAV=.document.getElementByClassName(“”)值
shares\u out=.document.getElementByClassName(“kv\u值kv\u主”).value
错误转到0
环
以
此工作簿。工作表(“ETF”)。单元格(2,13)。值=总净资产
此工作簿。工作表(“ETF”)。单元格(2,14)。值=NAV
此工作簿。工作表(“ETF”)。单元格(2,15)。值=开盘价
此工作簿。工作表(“ETF”)。单元格(2,16)。值=共享
即退出
“接下来我
Application.ScreenUpdating=True
端接头

好的,您需要创建两个循环。您可以继续为每个需要的价位重用
elem0
elem1
elemColl(1)
变量-只需确保在每次新迭代中将
bFoundIt
重置为False,这样您就不会提前退出for循环

对于您的
总净资产
var,您将首先循环
kv项目的类别
。然后,您将需要在
kv\u项目
的元素中循环
kv\u标签
的每个类集合,并在匹配内部文本时停止:总净资产。匹配后,您将使用第一个coll obj
elem0
获取它的
kv\uu值kv\uu primary
类名

Dim IE As Object, elem0 As Object, elem1 As Object, i As Long, bFoundIt As Boolean

Set IE = CreateObject("InternetExplorer.application")
With IE
    .Visible = False
    .navigate link
    While .Busy Or .readyState < 4: DoEvents: Wend
        DoEvents
        bFoundIt = False
        For Each elem0 In .document.getElementsByClassName("kv__item")
            For Each elem1 In elem0.getElementsByClassName("kv__label")
                If elem1.innerText = "Total Net Assets" Then
                    bFoundIt = True
                    total_net_assets = elem0.getElementsByClassName("kv__value kv__primary ")(0).innerText
                    Exit For
                End If
            Next elem1
            If bFoundIt Then Exit For
        Next elem0
Dim IE作为对象,elem0作为对象,elem1作为对象,i作为Long,b作为布尔值
设置IE=CreateObject(“InternetExplorer.application”)
与IE
.Visible=False
.导航链接
当.Busy或.readyState<4:DoEvents:Wend时
多芬特
bFoundIt=False
对于.document.getElementsByClassName(“千伏项目”)中的每个元素0
对于elem0.getElementsByClassName(“kv_uuu标签”)中的每个elem1
如果elem1.innerText=“总净资产”,则
bFoundIt=True
总净资产=elem0.getElementsByClassName(“千伏值千伏主值”)(0.innerText
退出
如果结束
下一个元素1
如果bFoundIt,则退出
下一个元素0

访问方法:

Option Explicit
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    Dim sResponse As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
        GetString = sResponse
    End With
End Function

Public Function GetInfo(ByVal html As HTMLDocument) As Object
    Dim dict As Object, i As Long
    Set dict = CreateObject("Scripting.Dictionary")
    dict.Add "Open", vbNullString
    dict.Add "Shares Outstanding", vbNullString
    dict.Add "Total Net Assets", vbNullString
    dict.Add "NAV", vbNullString

    Dim values As Object, labels As Object

    With html
        Set values = .querySelectorAll(".kv__value.kv__primary")
        Set labels = .querySelectorAll(".kv__label")

        For i = 0 To labels.Length - 1
            If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
        Next
    End With
    Set GetInfo = dict
End Function
Option Explicit   
Public Sub GetFundInfo()
    Dim sResponse As String, html As HTMLDocument, http As clsHTTP, i As Long
    Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
    Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"

    Application.ScreenUpdating = False

    headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
    Set results = New Collection
    Set http = New clsHTTP
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.

    For i = LBound(funds) To UBound(funds)
        If Not IsEmpty(funds(i)) Then
            url = BASE_URL & funds(i)
            html.body.innerHTML = http.GetString(url)
            results.Add http.GetInfo(html).Items
        End If
    Next

    If results.Count > 0 Then
        Dim item As Variant, r As Long, c As Long
        r = 2: c = 2
        With ws
            .Cells(1, c).Resize(1, UBound(headers) + 1) = headers
            For Each item In results
                .Cells(r, c).Resize(1, UBound(item) + 1) = item
                r = r + 1
            Next
        End With
    End If
    Application.ScreenUpdating = True
End Sub
我使用请求的速度与打开IE的速度一样快

代码注释:

Option Explicit
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    Dim sResponse As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
        GetString = sResponse
    End With
End Function

Public Function GetInfo(ByVal html As HTMLDocument) As Object
    Dim dict As Object, i As Long
    Set dict = CreateObject("Scripting.Dictionary")
    dict.Add "Open", vbNullString
    dict.Add "Shares Outstanding", vbNullString
    dict.Add "Total Net Assets", vbNullString
    dict.Add "NAV", vbNullString

    Dim values As Object, labels As Object

    With html
        Set values = .querySelectorAll(".kv__value.kv__primary")
        Set labels = .querySelectorAll(".kv__label")

        For i = 0 To labels.Length - 1
            If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
        Next
    End With
    Set GetInfo = dict
End Function
Option Explicit   
Public Sub GetFundInfo()
    Dim sResponse As String, html As HTMLDocument, http As clsHTTP, i As Long
    Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
    Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"

    Application.ScreenUpdating = False

    headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
    Set results = New Collection
    Set http = New clsHTTP
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.

    For i = LBound(funds) To UBound(funds)
        If Not IsEmpty(funds(i)) Then
            url = BASE_URL & funds(i)
            html.body.innerHTML = http.GetString(url)
            results.Add http.GetInfo(html).Items
        End If
    Next

    If results.Count > 0 Then
        Dim item As Variant, r As Long, c As Long
        r = 2: c = 2
        With ws
            .Cells(1, c).Resize(1, UBound(headers) + 1) = headers
            For Each item In results
                .Cells(r, c).Resize(1, UBound(item) + 1) = item
                r = r + 1
            Next
        End With
    End If
    Application.ScreenUpdating = True
End Sub
以下内容从表1 A列中的
A2
开始,将基金短代码读入一个数组中。您可以很容易地扩展此功能,在A列中添加更多资金

通过将基金代码连接到
BASE\u URL
变量,此数组循环发出XMLHTTP请求

我使用一个类,
clsHTTP
,来保持XMLHTTP对象的高效性——无需继续创建和销毁该对象

我为这个类提供了两种方法。一个用于检索目标页面innerHTML(
GetString
),另一个用于提取所需信息(如果可用)(
GetInfo
)。我使用字典来测试搜索到的标签是否存在。如果存在,我获取相关的值。如果没有,我在字典中有一个占位符
vbNullString

我将每个刮取的结果添加到名为
results
的集合中。最后,我把这篇文章循环到纸上。通过将大部分工作保留在内存中,可以更快地进行刮取


从HTML检索信息:

Option Explicit
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    Dim sResponse As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
        GetString = sResponse
    End With
End Function

Public Function GetInfo(ByVal html As HTMLDocument) As Object
    Dim dict As Object, i As Long
    Set dict = CreateObject("Scripting.Dictionary")
    dict.Add "Open", vbNullString
    dict.Add "Shares Outstanding", vbNullString
    dict.Add "Total Net Assets", vbNullString
    dict.Add "NAV", vbNullString

    Dim values As Object, labels As Object

    With html
        Set values = .querySelectorAll(".kv__value.kv__primary")
        Set labels = .querySelectorAll(".kv__label")

        For i = 0 To labels.Length - 1
            If dict.Exists(labels.item(i).innerText) Then dict(labels.item(i).innerText) = values.item(i).innerText
        Next
    End With
    Set GetInfo = dict
End Function
Option Explicit   
Public Sub GetFundInfo()
    Dim sResponse As String, html As HTMLDocument, http As clsHTTP, i As Long
    Dim headers(), funds(), url As String, results As Collection, ws As Worksheet
    Const BASE_URL As String = "https://www.marketwatch.com/investing/fund/"

    Application.ScreenUpdating = False

    headers = Array("Open", "Shares Outstanding", "Total Net Assets", "NAV")
    Set results = New Collection
    Set http = New clsHTTP
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    funds = Application.Transpose(ws.Range("A2:A3").Value) '<== Change the range here to the single column range containing your dotNums.

    For i = LBound(funds) To UBound(funds)
        If Not IsEmpty(funds(i)) Then
            url = BASE_URL & funds(i)
            html.body.innerHTML = http.GetString(url)
            results.Add http.GetInfo(html).Items
        End If
    Next

    If results.Count > 0 Then
        Dim item As Variant, r As Long, c As Long
        r = 2: c = 2
        With ws
            .Cells(1, c).Resize(1, UBound(headers) + 1) = headers
            For Each item In results
                .Cells(r, c).Resize(1, UBound(item) + 1) = item
                r = r + 1
            Next
        End With
    End If
    Application.ScreenUpdating = True
End Sub
标签
打开
,和
成对出现

您可以通过使用
querySelectorAll
方法生成
nodeList
(将集合视为与
getElementsByClassName
)来应用,以按标签元素的类名
kv_ulabel
收集标签元素。
是类选择器

Set labels = .querySelectorAll(".kv__label") '<== nodeList of labels
标准模块1: