Arrays 在VBA中创建反向索引的有效方法_Arrays_Vba_Excel_Dictionary_Inverted Index

Arrays 在VBA中创建反向索引的有效方法

arrays vba excel dictionary

Arrays 在VBA中创建反向索引的有效方法,arrays,vba,excel,dictionary,inverted-index,Arrays,Vba,Excel,Dictionary,Inverted Index,我正在创建一个反向索引，以获取一个单词字典，其中包含单词出现在其中的行号的关联列表（从行号开始，并列出该行中给定单元格中出现的单词列表）我已经设法得到了一些代码来处理这个问题，但是我发现处理添加到数组（字典中的值）有点麻烦，我想知道是否有一种更有效或更优雅的方法来处理这个问题我乐于使用数组、集合或任何其他可以轻松搜索的数据类型来存储字典值中的行号列表。我已经粘贴了我的代码的简化版本，以演示下面的核心问题，问题实际上只是关于buildInvoldIndex过程，但包含其余部分是为了使场景更容易

我正在创建一个反向索引，以获取一个单词字典，其中包含单词出现在其中的行号的关联列表（从行号开始，并列出该行中给定单元格中出现的单词列表）

我已经设法得到了一些代码来处理这个问题，但是我发现处理添加到数组（字典中的值）有点麻烦，我想知道是否有一种更有效或更优雅的方法来处理这个问题

我乐于使用数组、集合或任何其他可以轻松搜索的数据类型来存储字典值中的行号列表。我已经粘贴了我的代码的简化版本，以演示下面的核心问题，问题实际上只是关于

buildInvoldIndex

过程，但包含其余部分是为了使场景更容易重新创建：

Sub Test()
' minimum included here to demonstrate use of buildInvertedIndex procedure

    Dim vRange As Range
    Dim vDict As Dictionary

    Set vRange = ActiveSheet.Range("F2:F20585")
    Set vDict = New Dictionary

    BuildInvertedIndex vDict, vRange

    ' test values returned in dictionary (word: [line 1, ..., line n])
    Dim k As Variant, vCounter As Long
    vCounter = 0
    For Each k In vDict.Keys
        Debug.Print k & ": " & ArrayToString(vDict.Item(k))
        vCounter = vCounter + 1
        If vCounter >= 10 Then
            Exit For
        End If
    Next


End Sub


Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)

    Dim cell As Range
    Dim words As Variant, word As Variant, val As Variant
    Dim tmpArr() As Long
    Dim newLen As Long, i As Long

    ' loop through cells (one col wide so same as looping through lines)
    For Each cell In pRange.Cells

        ' loop through words in line
        words = Split(cell.Value)
        For Each word In words

            If Not pDict.exists(word) Then
                ' start line array with first row number
                pDict.Add word, Array(cell.Row())
            Else
                i = 0
                If Not InArray(cell.Row(), pDict.Item(word)) Then
                    newLen = UBound(pDict.Item(word)) + 1
                    ReDim tmpArr(newLen)
                    For Each val In tmpArr
                        If i < newLen Then
                            tmpArr(i) = pDict.Item(word)(i)
                        Else
                            tmpArr(i) = cell.Row()
                        End If
                        i = i + 1
                    Next val
                    pDict.Item(word) = tmpArr
                End If
            End If
        Next word
    Next cell

End Sub


Function ArrayToString(vArray As Variant, _
                       Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)

    Dim vDelimString As String
    Dim i As Long

    For i = LBound(vArray) To UBound(vArray)
        vDelimString = vDelimString & CStr(vArray(i)) & _
                       IIf(vCounter < UBound(vArray), vDelim, "")
    Next

    ArrayToString = vDelimString
End Function

子测试（）
“此处包含的最小值用于演示BuildInvertdIndex程序的使用
Dim vRange As范围
Dim vDict作为字典
设置vRange=ActiveSheet.Range（“F2:F20585”）
Set vDict=新字典
BuildInversedindex虚拟数据中心，vRange
'字典中返回的测试值（word:[第1行，…，第n行]）
尺寸k为变型，V为长型
vCounter=0
对于vDict.Keys中的每个k
Debug.Print k&“：”&数组字符串（vDict.Item（k））
vCounter=vCounter+1
如果vCounter>=10，则
退出
如果结束
下一个
端接头
子构建索引（pDict作为字典，pRange作为范围）
暗淡单元格作为范围
模糊词作为变体，词作为变体，val作为变体
将tmpArr（）变长
暗淡的纽伦一样长，我一样长
'循环通过单元格（一列宽，与循环通过行相同）
对于pRange中的每个单元格。单元格
“将单词按行循环
words=拆分（cell.Value）
字里行间
如果pDict.不存在（word），则
'具有第一行编号的起始行数组
添加单词、数组（cell.Row（））
其他的
i=0
如果不是InArray（cell.Row（），pDict.Item（word）），则
newLen=UBound（pDict.项（字））+1
雷迪姆·特姆帕尔（纽伦）
对于tmpArr中的每个val
如果我是纽伦的话
tmpArr（i）=pDict.项目（字）（i）
其他的
tmpArr（i）=cell.Row（）
如果结束
i=i+1
下一个val
pDict.项目（字）=tmpArr
如果结束
如果结束
下一个词
下一个细胞
端接头
函数ArrayToString（vArray作为变量_
可选vDelim作为字符串=“，”）作为字符串
'仅用于支持测试（能够查看阵列中的内容）
将vDelimString设置为字符串
我想我会坚持多久
对于i=LBound（vArray）到UBound（vArray）
vDelimString=vDelimString&CStr（vArray（i））&_
IIf（vCounter


要运行此操作，您需要活动工作表（句子）F列中的值，如果您还没有该值，还需要在VBA环境中添加对Microsoft脚本运行时的引用，以使字典数据类型可用（工具->引用->Microsoft脚本运行时）
正如您将从代码中看到的，这有点混乱，我必须在现有数组中插入一个新行号（作为值存储在字典中）。由于我不知道如何扩展此数组（而不清除现有值），因此我使用变量tmpArr创建了一个大小合适的数组，然后从字典中的现有数组中逐个复制值，然后将当前行号添加到末尾。然后使用临时数组替换该键（当前字）的现有值
如果您对此有任何建议，我们将不胜感激
我乐于使用数组、集合或任何其他数据类型
如我所见，使用集合代替数组将更简单：
Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)
    Dim cell As Range
    Dim words, word
    Dim i As Long    
    ' loop through cells (one col wide so same as looping through lines)
    For Each cell In pRange.Cells    
        ' loop through words in line
        words = Split(cell.Value)
        For Each word In words    
            If Not pDict.Exists(word) Then
                ' initialize collection
                pDict.Add word, New Collection
            End If
            'try to add to collection. If row is already in collecton, nothing happend. Storing key makes you sure there're only unique rows
            On Error Resume Next
            pDict.Item(word).Add Item:=cell.Row, Key:=CStr(cell.Row)
            On Error GoTo 0                
        Next word
    Next cell
End Sub

下一步是将ArrayToString
稍微修改为ColToString
：
Function ColToString(vCol As Collection, _
                   Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)

    Dim vDelimString As String
    Dim i As Long

    For i = 1 To vCol.Count
        vDelimString = vDelimString & CStr(vCol.Item(i)) & _
                       IIf(i < vCol.Count, vDelim, "")
    Next

    ColToString = vDelimString
End Function

结果：


更新：
为了提高代码的速度，您可以将范围存储在数组中（下一种方法仅适用于单列范围，但您可以轻松地修改它）：
测试接头：
Sub TestWirhArray()
' minimum included here to demonstrate use of buildInvertedIndex procedure

    Dim vRange As Range
    Dim vDict As Dictionary
    Dim myArr As Variant

    Set vDict = New Dictionary
    Set vRange = ActiveSheet.Range("F2:F20585")
    myArr = vRange.Value
    BuildInvertedIndexWithArr vDict, myArr, vRange.Row

    ' test values returned in dictionary (word: [line 1, ..., line n])
    Dim k As Variant, vCounter As Long
    vCounter = 0
    For Each k In vDict.Keys
        Debug.Print k & ": " & ColToString(vDict.Item(k))
        vCounter = vCounter + 1
        If vCounter >= 10 Then
            Exit For
        End If
    Next

    'clean up memory
    Set vDict = Nothing
End Sub

新版本的buildInvertedIndex WithARR
：
Sub BuildInvertedIndexWithArr(pDict As Dictionary, pArr, firstRow As Long)
    Dim cell, words, word
    Dim i As Long, j As Long

    j = firstRow
    ' loop through cells (one col wide so same as looping through lines)
    For Each cell In pArr

        ' loop through words in line
        words = Split(cell)
        For Each word In words

            If Not pDict.exists(word) Then
                ' initialize collection
                pDict.Add word, New Collection
            End If

            On Error Resume Next
            pDict.Item(word).Add Item:=j, Key:=CStr(j)
            On Error GoTo 0

        Next word
        j = j + 1
    Next cell
End Sub

您不能直接使用存储在字典中的数组-通常的方法是将其从字典中拉出，修改它，然后将其重新存储在同一插槽中。谢谢你的详细回复，现在就试一下。太棒了，谢谢。它不仅看起来更干净，而且速度提高了10倍以上。在刚刚超过20k行上，计时是：在11327.0713492417毫秒中使用数组作为字典中的值为20585行建立索引。
和在727.502338194183毫秒中使用集合作为字典中的值为20585行建立索引。
这很酷：）还有一个加速代码的技巧：使用范围来存储它在阵列中（它可以让你提高10-20倍的速度）。我已经用这种方法更新了我的asnwer。你能试试看，现在说计时吗，只是好奇：）。还有一点小小的改进：在508.731182722124毫秒中为20585行建立索引，使用集合作为字典中的值，并使用数组作为范围。。我还将结果写在一张纸上进行了比较，发现出现在许多行中的一些单词的结果并不相同。我现在正在调查哪个是正确的。看起来这是我的原始版本，数组中偶尔会有重复值，但它们似乎是
Sub BuildInvertedIndexWithArr(pDict As Dictionary, pArr, firstRow As Long)
    Dim cell, words, word
    Dim i As Long, j As Long

    j = firstRow
    ' loop through cells (one col wide so same as looping through lines)
    For Each cell In pArr

        ' loop through words in line
        words = Split(cell)
        For Each word In words

            If Not pDict.exists(word) Then
                ' initialize collection
                pDict.Add word, New Collection
            End If

            On Error Resume Next
            pDict.Item(word).Add Item:=j, Key:=CStr(j)
            On Error GoTo 0

        Next word
        j = j + 1
    Next cell
End Sub