Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Vba 使用2列删除重复项_Vba_Excel - Fatal编程技术网

Vba 使用2列删除重复项

Vba 使用2列删除重复项,vba,excel,Vba,Excel,我正在尝试删除工作表中的重复ID。例如,这里有几行数据 ID | Department | Sales | Update Date 1 | Sales | 100 | 2 | Marketing | 100 | 2 | Marketing | 200 | 30/06/2015 2 | Marketing | 300 | 05/07/2015 我想删除重复的ID,但以更新日期列为基础。因此,我

我正在尝试删除工作表中的重复ID。例如,这里有几行数据

ID   |  Department |  Sales   | Update Date
1    | Sales       | 100      | 
2    | Marketing   | 100      | 
2    | Marketing   | 200      | 30/06/2015
2    | Marketing   | 300      | 05/07/2015
我想删除重复的ID,但以更新日期列为基础。因此,我只想保留以下内容:

ID   |  Department |  Sales   | Update Date
1    | Sales       | 100      | 
2    | Marketing   | 300      | 05/07/2015
因此,它会检查该ID的最新更新行,并删除其他行


任何关于使用VBA或宏来执行此操作的建议都是很好的,因为它将构成自动脚本的一部分。

实现此目的的一种方法是读取所有行并遍历每个重复行,然后根据找到的最高更新日期查找要保留的内容和要删除的内容

我已经成功地编写了一个宏来实现这一点。这是我的密码:

首先:在VBA编辑器中创建一个空白模块并粘贴以下代码:

Public Type Row

    id As String
    updated As Date

    row_number As Integer    'to know which rows to delete later
    is_duplicate As Boolean  'to mark if current row is duplicate
    to_keep As Boolean       'to decide whether to keep or to delete
    verified As Boolean      'needed to avoid evaluating all rows with the same ID

End Type



Sub RemoveDuplicates()

Range("a2").Select    'go to first row
Dim cnt As Integer    'keep record of how many rows
cnt = 0               'begin with an empty array
Dim rows() As Row     'declared without the count


'== step 1: read all data and store in array ===============
Do While ActiveCell.Value <> ""
    cnt = cnt + 1
    ReDim Preserve rows(cnt)   'expand the size of the array by ONE
    rows(cnt - 1).row_number = ActiveCell.Row   'keep record of current row address

    If ActiveCell.Offset(1, 0).Value = ActiveCell.Value Or _
       ActiveCell.Offset(-1, 0).Value = ActiveCell.Value Then

       'if the cell above/below has the samve ID as the current cell, then it's duplicates

        rows(cnt - 1).is_duplicate = True
    Else
        rows(cnt - 1).is_duplicate = False
    End If

    rows(cnt - 1).id = ActiveCell.Value                   'store the id
    rows(cnt - 1).updated = ActiveCell.Offset(0, 3).Value 'store the date
    ActiveCell.Offset(1, 0).Select                        'move to the next row below
Loop


'=== step 2: iterating throw the array and deciding what to keep, what to delete =========
For i = 0 To cnt - 1
    If rows(i).is_duplicate And Not rows(i).verified Then
        'the current ID is duplicated, and all of the other rows with the same ID has not been verified
        find_to_keep rows, rows(i).id, cnt   'helper Sub to analyze each row

    End If
Next


'==== step 3: iterating throw the array to delete ones marked to delete ==========

For i = cnt - 1 To 0 Step -1  'we have to reverse the order because deleted rows will contain data from other valid rows

    If rows(i).is_duplicate And Not rows(i).to_keep Then
        'if the current row is duplicate and is not marked (to keep) then it must be deleted

        Dim r As Integer
        r = rows(i).row_number   'get the rows number (address) of the row

        Range(r & ":" & r).EntireRow.Delete shift:=xlShiftUp   'delete the row and shift the other rows below UP

    End If

Next

End Sub

Sub find_to_keep(ByRef rows() As Row, ByVal id As String, ByVal cnt As Integer)
    Dim max_date As Date   'temparary variable to hold the maximum date

    Dim to_keep As Integer  'temporary variable to hold the location of row to keep


    ' -- step a: go throw the array and find all rows with id specified in the sub parameter
    For i = 0 To cnt - 1
        If rows(i).id = id Then
            'if that row has a date that is higher than our current max_date, the read its date
            If rows(i).updated > max_date Then
                max_date = rows(i).updated
                to_keep = i
            End If
        End If
    Next



    '-- step b: now that we know what row to keep, we need to do:
    '           1- mark all other rows having the same ID as verified (to avoid looping through them again)
    '           2- mark the row with the highest date to (to_keep) = true

    For i = 0 To cnt - 1
        If rows(i).id = id Then
            If i = to_keep Then
                rows(i).to_keep = True
            Else
                rows(i).to_keep = False
            End If
            rows(i).verified = True

        End If
    Next

End Sub
公共类型行
id作为字符串
更新为日期
行\u编号为整数,以了解以后要删除哪些行
_duplicate As Boolean'标记当前行是否重复
_keep As Boolean'决定是保留还是删除
需要验证为布尔值,以避免计算具有相同ID的所有行
端型
子移除副本()
范围(“a2”)。选择“转到第一行”
Dim cnt As Integer“记录有多少行
cnt=0'以空数组开始
Dim rows()作为未包含计数的行声明
'==步骤1:读取所有数据并存储在数组中===============
当ActiveCell.Value“”时执行此操作
cnt=cnt+1
ReDim Preserve rows(cnt)'将数组的大小扩展一
行(cnt-1).row_number=ActiveCell.row'保留当前行地址的记录
如果ActiveCell.Offset(1,0).Value=ActiveCell.Value或_
ActiveCell.Offset(-1,0).Value=ActiveCell.Value然后
'如果上面/下面的单元格具有samve ID作为当前单元格,则它是重复的
行(cnt-1)。是否重复=真
其他的
行(cnt-1)。是否重复=False
如果结束
行(cnt-1).id=ActiveCell.Value'存储id
行(cnt-1).updated=ActiveCell.Offset(0,3).Value'存储日期
ActiveCell.Offset(1,0)。选择“移动到下面的下一行”
环
'==第2步:迭代抛出数组并决定保留什么、删除什么=========
对于i=0到cnt-1
如果行(i).重复,而不是行(i).已验证,则
'当前ID是重复的,具有相同ID的所有其他行尚未验证
查找以保存行,行(i).id,cnt的帮助器子对象以分析每行
如果结束
下一个
'==步骤3:迭代抛出数组以删除标记为删除的数组==========
对于i=cnt-1到0步骤-1'我们必须颠倒顺序,因为删除的行将包含来自其他有效行的数据
如果行(i).重复,而不是行(i).保留,则
'如果当前行重复且未标记(保留),则必须将其删除
作为整数的Dim r
r=行(i).行编号'获取行的行编号(地址)
范围(r&“:”&r).EntireRow.Delete shift:=xlShiftUp'删除该行并将下面的其他行向上移动
如果结束
下一个
端接头
Sub find_to_keep(ByRef rows()作为行,ByVal id作为字符串,ByVal cnt作为整数)
Dim max_date As date临时变量以保存最大日期
Dim to_keep As Integer临时变量,用于保存要保留的行的位置
'--步骤a:抛出数组并查找子参数中指定id的所有行
对于i=0到cnt-1
如果行(i).id=id,则
'如果该行的日期高于当前的max_日期,则会读取其日期
如果行(i).已更新>最大日期,则
最大日期=行(i)。已更新
保持
如果结束
如果结束
下一个
’--步骤b:既然我们知道要保留哪一行,我们需要做:
'1-将具有相同ID的所有其他行标记为已验证(以避免再次循环)
'2-标记日期最高的行to(to_keep)=true
对于i=0到cnt-1
如果行(i).id=id,则
如果我想继续那么
第(i)行。保留=真
其他的
第(i)行。保留=False
如果结束
行(i).已验证=真
如果结束
下一个
端接头
下面是它的外观:


如果您愿意,我已经附上了整个工作簿供您参考:

浏览所有行(从上到下),并为每个ID收集最长更新日期的数组。然后再次抛出行并删除与数组不匹配的记录。@Ralph不幸的是,我对数组不太熟悉,仍在研究如何使用它们,然后在
更新日期
旁边添加另一列,在该列中,您可以要求Excel为您计算最长日期。类似于
=MAX(如果($A$2:$A$5=A2;$D$2:$D$5))
。有关更多详细信息,请阅读以下内容:之后,您只需比较更新日期是否等于新列或它们是否不同(在这种情况下,您希望删除该行)。抱歉,我无意中将更新日期检查的列偏移到了错误的列!它确实有效。谢谢,回答得很好!