Excel 在字典中存储多个项目,以便以后打印
我用vba编写了一个脚本,从一个网页上从咖啡馆中删除不同的类别。我试图解析的类别是Excel 在字典中存储多个项目,以便以后打印,excel,vba,dictionary,web-scraping,Excel,Vba,Dictionary,Web Scraping,我用vba编写了一个脚本,从一个网页上从咖啡馆中删除不同的类别。我试图解析的类别是shopname、address和phone。我已经在脚本中定义了选择器。我面临的问题是,我无法将它们存储在字典中以便以后打印 如果是两个项目,我可以像我已经展示的那样处理它们。当有另一个项目出现时,我会感到困惑,比如在phone中(目前它在下面被注释掉)开始发挥作用 如何在字典中存储三项并打印它们? 要添加以执行上述脚本的参考: Microsoft XML, v6.0 Microsoft HTML Object
shopname
、address
和phone
。我已经在脚本中定义了选择器。我面临的问题是,我无法将它们存储在字典中以便以后打印
如果是两个项目,我可以像我已经展示的那样处理它们。当有另一个项目出现时,我会感到困惑,比如在phone中(目前它在下面被注释掉)开始发挥作用
如何在字典中存储三项并打印它们?
要添加以执行上述脚本的参考:
Microsoft XML, v6.0
Microsoft HTML Object Library
我想学习如何在字典中存储多个条目,以便以后打印。
预期产出:
看来我能达到如下的效果。如果有更好的方法出现,我会放弃我的答案:
For Each post In Html.getElementsByClassName("info")
shopName = post.querySelector(".business-name span").innerText
address = post.querySelector(".adr").innerText
phone = post.querySelector(".phones").innerText
idic(shopName & "|" & address & "|" & phone) = 1
Next post
For Each key In idic.keys
R = R + 1: Cells(R, 1) = Split(key, "|")(0)
Cells(R, 2) = Split(key, "|")(1)
Cells(R, 3) = Split(key, "|")(2)
Next key
我喜欢已经给出的答案(+)。您还可以将数组加载到项中
For Each post In Html.getElementsByClassName("info")
shopName = post.querySelector(".business-name span").innerText
address = post.querySelector(".adr").innerText
phone = post.querySelector(".phones").innerText
idic(post) = Array(shopName, address, phone)
Next post
For Each key In idic.keys
R = R + 1: ActiveSheet.Cells(R, 1) = idic(key)(0)
ActiveSheet.Cells(R, 2) = idic(key)(1)
ActiveSheet.Cells(R, 3) = idic(key)(2)
Next key
您也可以只使用应该很快的数组
Dim list As Object, arr(), post As Object, index As Long
Set list = Html.getElementsByClassName("info")
ReDim arr(1 To list.Length)
For Each post In list
index = index + 1
shopName = post.querySelector(".business-name span").innerText
address = post.querySelector(".adr").innerText
phone = post.querySelector(".phones").innerText
arr(index) = Array(shopName, address, phone)
Next
For index = LBound(arr) To UBound(arr)
ActiveSheet.Cells(index, 1).Resize(1, UBound(arr(index))) = arr(index)
Next
不过,我会尝试将html.getElementsByClassName(“info”)
加载到一个变量中,并在这两种情况下使用它
此外,数据以json字符串的形式存在于脚本标记中,因此如果使用json解析器,例如,您还可以执行以下操作:
Dim json As Object, item As Object, results(), i As Long
Set json = JsonConverter.ParseJson(Html.querySelectorAll("script[type='application/ld+json']").item(1).innerHTML)
ReDim results(1 To json.Count)
i = 1
For Each item In json
results(i) = Array(item("name"), Join$(item("address").Items, " ,"), item("telephone"))
i = i + 1
Next
另一种可能是为数据创建简单的类。然后将此类的实例添加到字典中。另外两个类
WebData
和InfoDataCollection
将有助于分离代码和提高可读性等
GetDictItems方法
WebData类模块
信息数据类模块
InfoDataCollection类模块
idic(shopName)=数组(地址、电话)
etcCells(R,2)=idic(key)(0):Cells(R,3)=idic(key)(1)
抱歉@Tim有任何误解。如果有5项呢?我已经更新了帖子。也许你现在明白我的意思了。谢谢。字典的意义在于你用英语输入一个单词,然后把它翻译成法语。你的需求更像是一本电话簿:输入姓名、地址和电话号码。这很好,只是你提到了打印列表。你没有提到查找。对于打印一个简单的2D数组就足够了,比如Dim Arr(1到5,1到3)。Arr(1,1)=“外围地区”,Arr(1,2)=“犹大街4001号”,Arr(1,3)=4156616140。数组中的每个元素都有3个部分。我很久没有遇到你的解决方案@dee了。很好的实现一如既往。虽然问题已经接受了答案,但我想添加更多的代码,这将介绍如何将代码划分为类的下一个可能性。老实说,我需要一本用户手册来运行您的脚本@dee。我一直是vba的新手。如果您有任何关于用例的指导,我们将不胜感激。非常感谢。没问题,看一看。它只是将关注点分开,这样每个类都会处理部分问题。因为HtmlLevel不支持querySelector。更新了这两个版本。有些元素类型确实支持(我使用的测试用例支持)。现在我可以进一步查看一个URL了。这正是实现这一技巧的逻辑idic(post)=数组(店名、地址、电话)
。这真是太棒了。虽然Tim Williams在其评论中首先提出了这一建议,但这还不够清楚,无法与真正的演示相符。感谢QHarr让我度过了美好的一天。很抱歉我的打扰@QHarr。如果我遵循这个逻辑,就会出现严重的问题,因为它会产生重复的值,而字典从不存储重复的值。你认为这是一个有效的方法(实际上是一个词汇表的作用)。谢谢。字典可以存储重复的值,但不能存储重复的键。你是说你有重复的钥匙?如果是这样,则添加一个计数器变量并将其用于键。你是,我想用一个物体作为键。它们可能是相同的对象类型,但在技术上不是相同的对象。
Dim json As Object, item As Object, results(), i As Long
Set json = JsonConverter.ParseJson(Html.querySelectorAll("script[type='application/ld+json']").item(1).innerHTML)
ReDim results(1 To json.Count)
i = 1
For Each item In json
results(i) = Array(item("name"), Join$(item("address").Items, " ,"), item("telephone"))
i = i + 1
Next
Const url = "https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=San%20Francisco%2C%20CA&page=2"
Sub GetDictItems()
With New WebData
.Load url
.PrintToExcel
End With
End Sub
Private m_html As HTMLDocument
Private m_data As InfoDataCollection
Private Sub Class_Initialize()
Set m_html = New HTMLDocument
Set m_data = New InfoDataCollection
End Sub
Public Sub Load(url As String)
With New XMLHTTP60
.Open "GET", url, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
m_html.body.innerHTML = .responseText
End With
m_data.Add m_html
End Sub
Public Sub PrintToExcel()
Dim key As Variant
Dim R As Long
Dim info As InfoData
For Each key In m_data.Keys
R = R + 1
Set info = m_data.Items(key)
Cells(R, 1) = info.ShopName
Cells(R, 2) = info.Address
Cells(R, 3) = info.Phone
Next key
End Sub
Private m_shopName As String
Private m_address As String
Private m_phone As String
Public Property Get ShopName() As String
ShopName = m_shopName
End Property
Public Property Let ShopName(ByVal vNewValue As String)
m_shopName = vNewValue
End Property
Public Property Get Address() As String
Address = m_address
End Property
Public Property Let Address(ByVal vNewValue As String)
m_address = vNewValue
End Property
Public Property Get Phone() As String
Phone = m_phone
End Property
Public Property Let Phone(ByVal vNewValue As String)
m_phone = vNewValue
End Property
Private m_dictionary As Object
Private Sub Class_Initialize()
Set m_dictionary = CreateObject("Scripting.Dictionary")
End Sub
Public Sub Add(html As HTMLDocument)
Dim info As InfoData
Dim post As HTMLDivElement
m_dictionary.RemoveAll
For Each post In html.getElementsByClassName("info")
Set info = New InfoData
info.ShopName = post.querySelector(".business-name span").innerText
info.Address = post.querySelector(".adr").innerText
info.Phone = post.querySelector(".phones").innerText
Set m_dictionary(info.ShopName) = info
Next post
End Sub
Public Property Get Keys() As Variant()
Keys = m_dictionary.Keys
End Property
Public Property Get Items() As Object
Set Items = m_dictionary
End Property