In Data Normalizer I showed you how I normalize worksheet data using arrays and For/Next loops. I’ve been doing a fair amount of SQL in VBA lately, and thought I’d rewrite the code using that approach.
A little searching revealed the T-SQL/SQL Server “Unpivot” command, which normalizes your data and sets the new field names all in one swell foop. It’s not available in Access/Jet SQL though, so can’t be used on Excel. Instead, the preferred method is to use a series of Selects that pick one normalizing column at a time (along with the repeating columns) and Unions them together.
I tried ADO first, using the method of SaveCopyAs’ing the workbook-to-be-normalized in order to avoid the ADO memory leak. ADO was way slower than DAO, something like four times slower with 3000 records of 16 columns. So I went with DAO, which still takes about twice as long as the array method. Turning the ADO to DAO was easy, especially with this concise sample from XL-Dennis.
As Jeff Weir pointed out in a comment, this does require a reference (Tools>References) to the Microsoft DAO 3.5 Object Library. I’ve been switching some code over to late binding, but there doesn’t seem to be much enthusiasm for this with DAO. I’m not sure if that’s because it’s so pervasive and well-established, or for some other reason.
The core logic of the routine is pretty simple. In pseudo-English:
Select all repeating columns
and Select (create) a new column, giving it the same name each time ("Team" in this example)
and Select the column with that team's data, giving it the same name each time ("Home Runs" in this example
and Union it to the Select statement created in the next loop iteration
Without further ado (heh heh) here’s the routine.
'List: The range to be normalized.
'RepeatingColsCount: The number of columns, starting with the leftmost,
' whose headings remain the same.
'NormalizedColHeader: The column header for the rolled-up category.
'DataColHeader: The column header for the normalized data.
'NewWorkbook: Put the sheet with the data in a new workbook?
'NOTE: The data must be in a contiguous range and the
'rows that will be repeated must be to the left,
'with the rows to be normalized to the right.
Sub NormalizeList_SQL_DAO(List As Excel.Range, RepeatingColsCount As Long, _
NormalizedColHeader As String, DataColHeader As String, _
Optional NewWorkbook As Boolean = False)
Dim FirstNormalizingCol As Long, NormalizingColsCount As Long
Dim RepeatingColsHeaders As Variant, NormalizingColsHeaders As Variant
Dim RepeatingColsIndex As Long, NormalizingColsIndex As Long
Dim wbSource As Excel.Workbook, wbTarget As Excel.Workbook
Dim wsTarget As Excel.Worksheet
Dim daoWorkSpace As DAO.Workspace
Dim daoWorkbook As DAO.Database
Dim daoRecordset As DAO.Recordset
Dim strSql As String
Dim strExtendedProperties As String
'If the normalized list won't fit, you must quit.
If .Rows.Count * (.Columns.Count - RepeatingColsCount) > .Parent.Rows.Count Then
MsgBox "The normalized list will be too many rows.", _
vbExclamation + vbOKOnly, "Sorry"
'List.Parent.Parent is the lists Workbook
Set wbSource = List.Parent.Parent
'The columns to normalize must be to the right of the columns that will repeat
FirstNormalizingCol = RepeatingColsCount + 1
NormalizingColsCount = .Columns.Count - RepeatingColsCount
'Get the header names of the repeating columns
RepeatingColsHeaders = List.Cells(1).Resize(1, RepeatingColsCount).Value
'Get the header names of the normalizing columns
NormalizingColsHeaders = List.Cells(FirstNormalizingCol).Resize(1, NormalizingColsCount).Value
strSql = vbNullString
'loop through each normalizing column
For NormalizingColsIndex = 1 To NormalizingColsCount
'Create an individual Select for the normalizing column
strSql = strSql & " SELECT "
'Select all the repeating columns
For RepeatingColsIndex = 1 To RepeatingColsCount
strSql = strSql & RepeatingColsHeaders(1, RepeatingColsIndex) & ", "
'Select the normalizing column and assign the NormalizedColHeader field name
'and select the data being counted and assign it the DataColHeader field name
strSql = strSql & "'" & NormalizingColsHeaders(1, NormalizingColsIndex) & "'" & " AS " & NormalizedColHeader & _
", " & NormalizingColsHeaders(1, NormalizingColsIndex) & " AS " & DataColHeader
strSql = strSql & " FROM [" & List.Parent.Name & _
"$" & List.Address(rowabsolute:=False, columnabsolute:=False) & "]"
If NormalizingColsIndex < NormalizingColsCount Then
'Union the Select statements created for the normalizing columns
strSql = strSql & " UNION ALL"
'Set up the DAO connection
strExtendedProperties = "Excel 8.0;HDR=Yes;IMEX=1"
Set daoWorkSpace = DBEngine.Workspaces(0)
Set daoWorkbook = daoWorkSpace.OpenDatabase(wbSource.FullName, False, True, strExtendedProperties)
Set daoRecordset = daoWorkbook.OpenRecordset(strSql, dbOpenForwardOnly)
'Put the normal data in the same workbook, or a new one.
If NewWorkbook Then
Set wbTarget = Workbooks.Add
Set wsTarget = wbTarget.Worksheets(1)
Set wbSource = List.Parent.Parent
Set wsTarget = .Add(after:=.Item(.Count))
'copy the headers and DAO recordset to the new worksheet
.Cells(1, 1).Resize(1, RepeatingColsCount).Value = RepeatingColsHeaders
.Cells(1, RepeatingColsCount + 1) = NormalizedColHeader
.Cells(1, RepeatingColsCount + 2) = DataColHeader
.Cells(2, 1).CopyFromRecordset daoRecordset
Set daoRecordset = Nothing
Set daoWorkbook = Nothing
Set daoWorkSpace = Nothing
If you break after strSql is created, it looks like this:
SELECT League, Year, 'CHC' AS Team, CHC AS HomeRuns FROM [HR-NL$A1:R110]
SELECT League, Year, 'MIL' AS Team, MIL AS HomeRuns FROM [HR-NL$A1:R110]
Call it like this:
One pitfall of this SQL version is the mixed data-type issue with the Excel ISAM driver. In this example it converts all the home run counts from numbers to text because of the blanks in the data.
All in all, I think the array approach is better than SQL for this use. The core skill of creating SQL in VBA is a valuable one though, and one I’m glad to be developing.
I updated the Data Normalizer .xls to include this code.