This post consists of two topics, both involving VBA’s RemoveDuplicates method. First I discuss an error and a shortcoming in the RemoveDuplicates documentation that have confounded myself and others. After those are cleared up I’ll show how to use RemoveDuplicates to return a two-dimensional array of unique items from a range containing one or more columns.
RemoveDuplicates was added to VBA (along with it’s front-end counterpart Data>Remove Duplicates) in Excel 2007. It’s very handy, allowing you to select one or more columns in a range and and winnow it down to the rows where the combined values of those columns are unique. It does this “in-place,” leaving other areas of the sheet intact.
RemoveDuplicates’ Columns Argument isn’t Really Optional
The Excel 2013 MSDN documentation describes the variant Columns argument as an optional “Array of indexes of the columns that contain the duplicate information. If nothing is passed then it assumes all columns contain duplicate information.” (my emphasis)
Later it gives an example of leaving out the Columns argument to remove duplicates from the entire range:
“The following code sample removes duplicates with all columns from a range.”
However, testing reveals that the above code does nothing. This is confirmed by the perplexed comments of various forum visitors. In truth, you have to specify the columns variant array in all cases. The help example should look like this:
If you want to base the removal on just some of the columns in a range, for example columns A and C, you do it like this:
Using a Dynamic Columns Array
If you use RemoveDuplicates in code, there’s a good chance you’ll want to determine the array of column numbers at runtime and not hard-code them as above. As a simple test, I tried something like:
ColumnsToCheck = Array(1,3)
However this results in a Runtime Error ‘5’ – Invalid Procedure Call or Argument.
After looking around the web a while I found that the answer is to wrap ColumnsToCheck in parentheses, so the code becomes:
ColumnsToCheck = Array(1,3)
I know, from one of my all-time favorite DDOE posts that parentheses force evaluation of what’s inside them. Others on the web said that these parens “coerce” the evaluation of the array. I don’t really see how that applies here, so if you can explain why this works, please leave a comment.
One other note about the Columns variant array – it must be zero-based.
Unique Unique Per-Row Values Array Function
Once I’d sorted out the above points, I was able to use RemoveDuplicates as the basis of a function that returns a two-dimensional variant array containing the unique rows in a range. The range can consist of multiple columns that aren’t necessarily adjacent, as in this table of hometowns and sports of 2008 US Olympic athletes:
In the example above I want to return a variant array of all the unique combinations of the State and Sport fields. The specific use is for a worksheet splitter, where I prompt the user for one or two columns and the cycle through the array of unique combinations, creating a separate workbook for each combination.
The code is simple enough. It takes a range as it’s argument, for example the State and Sport fields above. It copies the range it to a worksheet in a newly added workbook. We then use RemoveDuplicates on the entire UsedRange of that new sheet. Because the columns are now contiguous the code can easily assign the modified range back to a variant array. The temporary workbook is then closed without saving.
Dim wbTemp As Excel.Workbook
Dim wsTemp As Excel.Worksheet
Dim ColNums As Variant
Dim i As Long
'if only one cell selected then can't pass to two-dimensional array
If rng.Cells.Count = 1 Then
'if the single cell is blank
If rng.Value = vbEmpty Then
GetUniqueRowValues = rng.Value
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'create a temp workbook
'and copy range to it
Set wbTemp = Workbooks.Add
Set wsTemp = wbTemp.Worksheets(1)
'rng can consist of non-adjactent columns
'pasting to a new workbook results in a
'new range with only those columns
ReDim ColNums(0 To wsTemp.UsedRange.Columns.Count - 1)
'Create the array of column numbers for the
'RemoveDuplicates' Columns parameter
For i = LBound(ColNums) To UBound(ColNums)
ColNums(i) = i + 1
.UsedRange.RemoveDuplicates Columns:=(ColNums), Header:=xlGuess
'assign the values of the remaining,
'non-duplicated rows to the function result
GetUniqueRowValues = .UsedRange
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Note that is the original range was just one cell, the returned variant will not be an array. But you’d test for that before running this in the first place, wouldn’t you?
This code is fast for big ranges, handling over one million rows in less than five seconds on my laptop. On the other hand, it has to open and close a workbook, which takes part of a second and causes screen flicker. It’s a lot more flexible at least than my attempts to modify the Collection technique to handle large ranges with non-adjacent columns.