Pivot Multiple Worksheets

Pivot Multiple Worksheets

As I’ve mentioned before, these days I use Excel more and more for developing and testing SQL code. As part of that I often compare of sets of output from SQL. And as part of that I sometimes I find it useful to pivot multiple worksheets.

For example, I just finished a project of translating a query from one data warehouse to another. The new database has a completely different schema than the old – new tables, new fields, new behaviors. My goal was to develop a query that returned the same results from the new database as those from the old.

To compare the outputs, I created two tables (listobjects) in a single workbook. The first table had a connection to the old data warehouse and uses the old query as its Command Text. The second table is connected to the new data warehouse, and was where I’d test the SQL I was developing.

Especially at first, there were quite a few differences in the output of these two queries in these two tables. Comparing the outputs in a pivot table let me see these differences clearly, both in summary and in detail.

Here’s a very simple example using my trusty pie data. In this example I have two different tables on two sheets with slightly different pie orders. Here’s the output from data warehouse 1…

pie table 1

and here it is from data warehouse 88b…

pie table 2

I conveniently placed differences in the Quantity column near the top, so you may be able to just pick them out by eye. And you may even have caught the one date field discrepancy. However, after combining the two tables into one, adding a “Source” column and then pivoting, the differences become easy to pick out, especially with a little conditional formatting:

pie pivot comparison

In the pivot above, “2”s in the Grand Total column represent all the records where the two queries returned the same results. The “1”s point to the discrepancies.

This is a flexible and powerful comparison method. Benefits include:

  • You can quickly add or subtract fields from the pivot to pinpoint the differences.
  • You can change the orders of the fields.
  • If you add subtotals you can then double-click on those with disrepancies to drill down to just those results.

For a while I created these combined source tables manually, just pasting the two sets of results together, adding a column “Source” column with “DW_1” and “DW_88b.” This worked fairly well, but after several times it cried out for automation.

The VBA below keys off of selected sheets in a workbook. Just select the ones you want to pivot and then run the code. Here you can see that both sheets are selected, and I’ve added the “Pivot Multiple Sheets” macro to the tab’s right-click menu (with MenuRighter, of course).

ply menu

The code first collects all the data necessary for the connection and then closes the source workbook. (I did this to avoid memory leaks or whatever it is that makes things go wonky if the workbook is open at the same time I’m creating a connection to it.) It uses that data to create the Source and SQL strings. The SQL is just a series of SELECTS, one for each selected worksheet, connected with UNION ALLs.

The newly created worbook contains a table with the connection to the source workbook and a pivot table pointed at that table. The table’s “Source” field becomes a column in the pivot table, containing the names of the two or more worksheets. The rest of the table columns become pivot table row fields. The connection in this workbook is live, so that if you make changes to the source they will appear in this workbook once you refresh the data and pivot:

Sub Pivot_Multiple_Sheets()

Dim wbToPivot As Excel.Workbook
Dim SheetsToPivot As Excel.Sheets
Dim SourceFullName As String
Dim SourceString As String
Dim wbWithPivot As Excel.Workbook
Dim wsWithQueryTable As Excel.Worksheet
Dim SheetsToPivotCount As Long
Dim SheetsToPivotNames() As String
Dim qt As Excel.QueryTable
Dim i As Long
Dim SqlSelects() As String
Dim sql As String
Dim pvt As Excel.PivotTable
Dim pvtField As Excel.PivotField

If ActiveWorkbook Is Nothing Then
    MsgBox "No active workbook."
    Exit Sub
End If
Set wbToPivot = ActiveWorkbook

If Not wbToPivot.Saved Then
    MsgBox "Please save this workbook before running." & vbCrLf & _
                 "Workbook will be closed by this utility" & _
                 "after the process is completed."
    Exit Sub
End If

'This code acts on the Selected Sheets
Set SheetsToPivot = wbToPivot.Windows(1).SelectedSheets
If SheetsToPivot.Count = 1 Then
    MsgBox "Please select two or more worksheets (no charts)."
    Exit Sub
End If

SheetsToPivotCount = SheetsToPivot.Count

For i = 1 To SheetsToPivotCount
    If Not TypeName(SheetsToPivot(i)) = "Worksheet" Then
        MsgBox "Please select two or more worksheets (no charts)."
        Exit Sub
    End If
Next i

SourceFullName = wbToPivot.FullName
ReDim SheetsToPivotNames(1 To SheetsToPivotCount)
For i = 1 To SheetsToPivotCount
    SheetsToPivotNames(i) = SheetsToPivot(i).Name
Next i
'Change Selection to only one sheeet
SheetsToPivot(1).Select
'Close the source workbook before creating the new one and its connections
'Save it so not prompted
wbToPivot.Close True

Set wbWithPivot = Workbooks.Add
'Delete any extra worksheets
For i = wbWithPivot.Worksheets.Count To 2 Step -1
    Application.DisplayAlerts = False
    wbWithPivot.Worksheets(i).Delete
    Application.DisplayAlerts = True
Next i
Set wsWithQueryTable = wbWithPivot.Worksheets(1)
wsWithQueryTable.Name = "Data Table"
'Don't know why this is needed, but otherwise .CommandText line below fails
wsWithQueryTable.Activate

'I got rid of a lot of fields in connection - still seems to work
SourceString = "ODBC;DSN=Excel Files;DBQ=" & SourceFullName
'Create an array of SELECT statements
ReDim SqlSelects(1 To SheetsToPivotCount)
For i = 1 To SheetsToPivotCount
    SqlSelects(i) = "SELECT" & vbCrLf & _
                    "'" & SheetsToPivotNames(i) & "' as Source," & vbCrLf & _
                    "Sheet" & i & ".*" & vbCrLf & _
                    "FROM" & vbCrLf & _
                    "`" & SourceFullName & "`.[" & SheetsToPivotNames(i) & "$] AS Sheet" & i
Next i
'Connect the SELECTS with UNION ALL
For i = LBound(SqlSelects) To UBound(SqlSelects) - 1
    sql = sql & SqlSelects(i) & vbCrLf & "UNION ALL" & vbCrLf
Next i
sql = sql & SqlSelects(i)

Set qt = wsWithQueryTable.ListObjects.Add(SourceType:=0, Source:=SourceString, Destination:=wsWithQueryTable.Range("$A$1")).QueryTable
With qt
    .CommandText = sql
    .ListObject.DisplayName = "tbl" & Format(Now(), "yyyymmddhhmmss") & Right(Format(Timer, "#0.00"), 2)
    .RowNumbers = False
    .FillAdjacentFormulas = False
    .PreserveFormatting = True
    .RefreshOnFileOpen = False
    .BackgroundQuery = True
    .RefreshStyle = xlInsertDeleteCells
    .SavePassword = False
    .SaveData = True
    .RefreshPeriod = 0
    .PreserveColumnInfo = True
    'I like it to preserve the widths the first time it's run, and below turn it to false
    .AdjustColumnWidth = True
    .Refresh BackgroundQuery:=False
    .AdjustColumnWidth = False
End With
wbWithPivot.Worksheets.Add
With ActiveSheet
    .Name = "Pivot"
    Set pvt = .Parent.PivotCaches.Create(SourceType:=xlDatabase, SourceData:=qt.ListObject.Name).CreatePivotTable(TableDestination:=.Range("A1"))
    pvt.AddDataField Field:=pvt.PivotFields("Source"), Function:=xlCount
    With pvt.PivotFields("Source")
        .Orientation = xlColumnField
        .Position = 1
    End With
    For Each pvtField In pvt.PivotFields
        If pvtField.Name <> "Source" Then
            pvtField.Orientation = xlRowField
            pvtField.Position = pvt.RowFields.Count
        End If
    Next pvtField
End With
End Sub

To use this code put it in your Personal.xlsb or any workbook besides the one with the source data.

This code could use some more error-checking. For example, if the two sheets have a different number of columns. Even more important is the addition of whatever kind of general error handling you use so you exit gracefully from bad connection strings and other such inevitable problems.

Speaking of bad connection strings, you may notice that I’ve ditched the Default Directory, DriverId, BufferSize, MaxPageTimeOuts and whatnot from the connection. I did that to see if it worked. It did, so I never added them back. I see that they reappear in the connection properties for the table:

connection properties

I ran this code in Excel 2010 and 2013. I don’t know how portable this code is to other Excel versions. I also don’t know if you’ll have performance issues if you have the source and pivot workbooks open at the same time.

If you’re interested in this topic be sure to take a look at Kirill Lapin’s method, posted on Contextures. His method keeps the source and the pivot table in one workbook, deleting the connection in between refreshes of the pivot table. I think Kirill’s method is nice for more traditional pivot table use where you want to merge different data sets with the same format, e.g., eastern and western sales regions.

I like my method because it requires no setup for the source workbook, keeps a refreshable connection and arranges the pivot table for comparison.

I’d love to hear anybody’s opinion on the stability of this method, i.e., when opening both the source and the connected data at the same time. Also, I’m curious if this code works in other versions besides 2010 and 2013. These are areas where my knowledge is pretty piecemeal, so any help would be appreciated.

Importing SQL Files Into Data Connections

Importing SQL Files Into Data Connections

By now you may know that I love data connections in Excel. Sometimes I use them for the front-ends in finished projects, but mostly I use them for testing SQL. With its formulas, tables and pivot tables, Excel makes a great test environment for validating SQL results. You can of course just paste query output straight from SQL Server Management Studio or other development environments, but the it doesn’t always format correctly. For instance Varchar ID fields that are all numbers lose leading zeros and dates lose their formats. In my experience those problems don’t happen with data connections

In this post, we’ll start with the basics of a reusable Table/SQL connection to which you can then add your SQL. Then I’ll share some code that lets you point at one or more .sql files and creates a connected table for each one. (An .sql file is just a text file with SQL in it and an .sql extension for handy identification.)

A Reusable Table/SQL Connection

At work I have a default data connection to the main database we query, all set up with the Connection, Command Type and some dummy Command Text. Whenever I want to run some SQL against that database in Excel, I just click on that connection in Data > Existing Connections. If I worked at home and used SQL Server and kept the corporate database on my laptop, the connection could look like this.

SQL Server template connection

I created it by going to Data > Connections > Other Sources > From SQL Server. After following the wizard, I modified the connection by changing the Command Type to SQL and the Command Text to the meaningless, but super-speedy query “SELECT ‘TEMP’ FROM TEMP.”

So now I’ve got a template I can call from Data > Existing Connections and quickly modify the SQL, say to something like:

SQL Server template connection 2

Inserting SQL Directly From .sql Files

Recently I thought I’d take this a bit further and pull the CommandText directly from an .sql file. So I wrote some code that has you pick one or more .sql files, and then creates a new Worksheet/Table/Query for each one in a new workbook. The main query is below. The heart of it looks a lot like what you got if you ran the macro recorder while creating a new connection:

Sub AddConnectedTables()
Dim wbActive As Excel.Workbook
Dim WorksheetsToDelete As Collection
Dim ws As Excel.Worksheet
Dim qt As Excel.QueryTable
Dim sqlFiles() As String
Dim ConnectionIndex As Long

sqlFiles = PickSqlFiles
If IsArrayEmpty(sqlFiles) Then
    Exit Sub
End If

Workbooks.Add
Set wbActive = ActiveWorkbook
'Identify the empty sheet(s) the workbook has on creation, for later deletion
Set WorksheetsToDelete = New Collection
For Each ws In wbActive.Worksheets
    WorksheetsToDelete.Add ws
Next ws

For ConnectionIndex = LBound(sqlFiles) To UBound(sqlFiles)
    wbActive.Worksheets.Add after:=ActiveSheet
    '*** Modify the location below to match your computer ***
    Set qt = ActiveSheet.ListObjects.Add(SourceType:=0, _
        Source:="ODBC;DSN=Excel Files;DBQ=E:\DOCS\YOURSUMBUDDY\BLOG\POST_72_SQL_IMPORTER\Post72_Data.xlsx;DriverId=1046;MaxBufferSize=2048;PageTimeout=5;", _
        Destination:=Range("$A$1")).QueryTable
    With qt
        'Temporary command text makes the formatting for the real query work
        .CommandText = ("SELECT 'TEMP' AS TEMP")
        .ListObject.DisplayName = "tbl" & Format(Now(), "yyyyMMddhhmmss") & "_" & ConnectionIndex
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        'Refresh first with just the template query
        .Refresh BackgroundQuery:=False
        .CommandText = ReadSqlFile(sqlFiles(ConnectionIndex))
        'Refresh again with the new SQL. Doing this in two steps makes the formatting work.
        .Refresh BackgroundQuery:=False
        .AdjustColumnWidth = False
        'Name the just-created connection and table
        .ListObject.DisplayName = Replace("tbl" & Mid$(sqlFiles(ConnectionIndex), InStrRev(sqlFiles(ConnectionIndex), Application.PathSeparator) + 1, 99) & Format(Now(), "yyyyMMddhhmmss") & "_" & ConnectionIndex, ".sql", "")
        wbActive.Connections(1).Name = .ListObject.DisplayName
    End With
Next ConnectionIndex

'Delete the empty sheet(s) the worbook had on creation
Application.DisplayAlerts = False
For Each ws In WorksheetsToDelete
    WorksheetsToDelete(1).Delete
Next ws
Application.DisplayAlerts = True
End Sub

Notice that the code refreshes the querytable twice. If I just go straight to the query from the .sql file, I end up with the same type of formatting problem described at the beginning of this post. For example, dates come through without formatting, like 41985. Starting with a dummy query of SELECT ‘TEMP’ AS TEMP, refreshing it, setting the .CommandText to the correct SQL and refreshing again results in correct formatting.

The code also sets .AdjustColumnWidth twice because I like to start with correct column widths and then not have them adjust after that.

You’ll also note that the connection in the code above isn’t to a SQL Server database anymore, but to an Excel workbook. That’s because I created a downloadable folder for you to try this out in, and the easiest data source to include is an Excel workbook. See the end of this post for the link and a few instructions.

(Also as a weird bonus in the code above is something I came up with to delete the one or more vestigial empty worksheets that get created in a situation like this where your creating a new workbook in code.)

Below are the three functions called from the module above. One uses a File Dialog to pick one or more .sql files.

Private Function PickSqlFiles() As String()
Dim fdFileDialog As FileDialog
Dim SelectedItemsCount As Long
Dim sqlFiles() As String
Dim i As Long

Set fdFileDialog = Application.FileDialog(msoFileDialogOpen)
With fdFileDialog
    .ButtonName = "Select"
    .Filters.Clear
    .Filters.Add "SQL Files (*.sql)", "*.sql"
    .FilterIndex = 1
    .InitialView = msoFileDialogViewDetails
    .Title = "Select SQL Files"
    .ButtonName = "Select"
    .AllowMultiSelect = True
    .Show
    If .SelectedItems.Count = 0 Then
        GoTo Exit_Point
    End If
    SelectedItemsCount = .SelectedItems.Count
    ReDim sqlFiles(1 To SelectedItemsCount)
    For i = 1 To SelectedItemsCount
        sqlFiles(i) = .SelectedItems(i)
    Next i
End With
PickSqlFiles = sqlFiles

Exit_Point:
End Function

This one returns the SQL from the .sql file, so that it can then be stuffed into the QueryTable’s .CommandText property:

Private Function ReadSqlFile(SqlFileFullName As String)
Dim SqlFileLine As String
Dim Sql As String

Open SqlFileFullName For Input As #1
Do Until EOF(1)
    Line Input #1, SqlFileLine
    Sql = Sql & SqlFileLine & vbNewLine
Loop
'Sql = Input$ '(LOF(#1), #1)
Close #1
ReadSqlFile = Sql
End Function

And this is Chip Pearson’s code for checking if an array, specifically that returned by the PickSqlFiles function, is empty:

Public Function IsArrayEmpty(Arr As Variant) As Boolean
'Chip Pearson
Dim LB As Long
Dim UB As Long

Err.Clear
On Error Resume Next
If IsArray(Arr) = False Then
    ' we weren't passed an array, return True
    IsArrayEmpty = True
End If
UB = UBound(Arr, 1)
If (Err.Number <> 0) Then
    IsArrayEmpty = True
Else
    Err.Clear
    LB = LBound(Arr)
    If LB > UB Then
        IsArrayEmpty = True
    Else
        IsArrayEmpty = False
    End If
End If
End Function

Download and Instructions

This download marks a new level of complexity, so it’s got instructions.

After you download you’ll need to unzip the folder to wherever you want. It contains five files, the xlsm with the code, the workbook data source and three .sql files with queries against that data source:

Unzipped folder

There’s further instructions in the xlam file. As noted there, you’ll need to change the path in the VBA to your unzipped folder (technically, you don’t because Excel will prompt you when it can’t find the folder in the VBA, but it will be cooler if you do). There’s a handy Cell formula in the Post72_Import_SQL.xlsm which will give you the correct file path.

Here’s the downloadable folder. Let me know what you think!

Tables’ Edit Query Dialog

Tables’ Edit Query Dialog

One of my favorite Excel features is the Edit OLE DB Query dialog, where you can edit a table’s data connection and SQL. If you’re not familiar with table data connections, I’m talking about tables created by clicking something like “From Access” in the ribbon’s Data menu. Besides Access, you can connect to other databases, Excel files, the web, and who knows what else. Here’s a nice example of connecting to Access.

To get to the Edit OLE DB Query dialog, right-click in a table and choose “Table” then “Edit…”

Each time I do so I’m filled with child-like wonder at the ability to throw in a new connection string, switching, for example, from Access to SQL Server.

Edit Query dialog 2

In addition you can change from the default Command Type of “Table” – which returns all the contents of a table, query or view – to “SQL,” which allows you to enter SQL directly into the Command Text box.

As far as I can tell this SQL can be as complex as what you would use directly in that type of database. For example, you can use With statements with a SQL Server connection.

That being said, and depending on my access privilegees, I try to limit the amount of SQL on the Excel side. If possible, I connect to an existing database view and then maybe filter it in the Command Text box. So with a SQL Server view called vwScoresAllYears, I might narrow the results with SQL like “SELECT * FROM vwScoresAllYears WHERE vwScoresAllYears.year IN (2012,2013).”

As cool as that is – and it is – it pales next to the ability to switch connections to a completely different type of database. One time we were porting a project from Access to SQL Server. The front end was a big old workbook with a data table pointing at an Access query. The table had lots of calculated columns and several pivot tables pointing at it. To do the switch, we could have created a new table pointing at the SQL Server view, rebuilt the calculated columns and pointed the pivot tables at it. Instead, I just switched the connection string so that it pointed at the SQL Server view. Since the output columns were exactly the same, the transition was barely noticeable. I just hit refresh and the SQL Server data poured into the table.

The following two pics show what I mean. The first connection is to an Access database…
Access connection

The second connection is to a SQL Server database:SQL Server connection

If you’re wondering, the connection shown in the first picture in this post is to another Excel workbook. In that case there are three worksheets serving as tables. The SQL looks a bit different, because when referring to Excel sheets or ranges, the names are surrounded in brackets, and followed by dollar signs in the case of sheets. If the idea of using SQL on Excel workbooks is new to you, this MSDN page is a good start.

I often find myself copying query settings from one table to another. For a while I’d open the Edit OLE DB Query dialog for a table, copy the connection data, open the dialog for the second one and paste the data. That got kind of boring, so guess what? That’s right, I wrote a tool to do what I want:

Copy Properties tool

The way it works is you select a Source table – by clicking into a table and clicking the form button – and then do the same for the Target. You then select which parts of the query – Connection, Type, or Text – to copy over. You can also just edit the text in the Connection or Command Text boxes. Clicking the Copy button just copies the selected properties from the source side of the form to the target side – it doesn’t change the query properties themselves. You can make further edits in the target textboxes as needed. Clicking the Set Properties button copies applies the properties to the target table and attempts to refresh the table. If the refresh fails you get an error message. At that point you can tinker with the properties some more or click the Restore Last Good button, which will revert the table properties to the last working query.

This is different than the built-in dialog, which just reverts to the last working query. I find this ability to tweak a query until it works quite handy. Another advantage is that it opens up queries created by MS Query without the dialog telling you that you won’t be able to edit it.

To make it user-friendly for Dick and other shortcut-only types, the checkboxes can be reached by clicking the Alt Key combo for that property twice.

I uploaded an empty workbook that has a button to display this modeless form. If you download it be sure to save your work before trying this tool. I’ve used it for a few months now with no big issues, but better safe than sorry. Here’s the link.

Data Normalizer – the SQL

Data Normalizer – the SQL

In Data Normalizer I showed you how I normalize worksheet data using arrays and For/Next loops. I’ve been doing a fair amount of SQL in VBA lately, and thought I’d rewrite the code using that approach.

abnormal data

normalized data

A little searching revealed the T-SQL/SQL Server “Unpivot” command, which normalizes your data and sets the new field names all in one swell foop. It’s not available in Access/Jet SQL though, so can’t be used on Excel. Instead, the preferred method is to use a series of Selects that pick one normalizing column at a time (along with the repeating columns) and Unions them together.

I tried ADO first, using the method of SaveCopyAs’ing the workbook-to-be-normalized in order to avoid the ADO memory leak. ADO was way slower than DAO, something like four times slower with 3000 records of 16 columns. So I went with DAO, which still takes about twice as long as the array method. Turning the ADO to DAO was easy, especially with this concise sample from XL-Dennis.

As Jeff Weir pointed out in a comment, this does require a reference (Tools>References) to the Microsoft DAO 3.5 Object Library. I’ve been switching some code over to late binding, but there doesn’t seem to be much enthusiasm for this with DAO. I’m not sure if that’s because it’s so pervasive and well-established, or for some other reason.

The core logic of the routine is pretty simple. In pseudo-English:

For each column in the columns to be normalized
Select all repeating columns
and Select (create) a new column, giving it the same name each time ("Team" in this example)
and Select the column with that team's data, giving it the same name each time ("Home Runs" in this example
and Union it to the Select statement created in the next loop iteration

Without further ado (heh heh) here’s the routine.

'Requires a reference to DAO 3.5 or later
'Arguments
'List: The range to be normalized.
'RepeatingColsCount: The number of columns, starting with the leftmost,
'   whose headings remain the same.
'NormalizedColHeader: The column header for the rolled-up category.
'DataColHeader: The column header for the normalized data.
'NewWorkbook: Put the sheet with the data in a new workbook?
'
'NOTE: The data must be in a contiguous range and the
'rows that will be repeated must be to the left,
'with the rows to be normalized to the right.

Sub NormalizeList_SQL_DAO(List As Excel.Range, RepeatingColsCount As Long, _
                          NormalizedColHeader As String, DataColHeader As String, _
                          Optional NewWorkbook As Boolean = False)

Dim FirstNormalizingCol As Long, NormalizingColsCount As Long
Dim RepeatingColsHeaders As Variant, NormalizingColsHeaders As Variant
Dim RepeatingColsIndex As Long, NormalizingColsIndex As Long
Dim wbSource As Excel.Workbook, wbTarget As Excel.Workbook
Dim wsTarget As Excel.Worksheet

Dim daoWorkSpace As DAO.Workspace
Dim daoWorkbook As DAO.Database
Dim daoRecordset As DAO.Recordset
Dim strSql As String
Dim strExtendedProperties As String

With List
    'If the normalized list won't fit, you must quit.
    If .Rows.Count * (.Columns.Count - RepeatingColsCount) > .Parent.Rows.Count Then
        MsgBox "The normalized list will be too many rows.", _
               vbExclamation + vbOKOnly, "Sorry"
        Exit Sub
    End If
    'List.Parent.Parent is the lists Workbook
    Set wbSource = List.Parent.Parent
    'The columns to normalize must be to the right of the columns that will repeat
    FirstNormalizingCol = RepeatingColsCount + 1
    NormalizingColsCount = .Columns.Count - RepeatingColsCount
    'Get the header names of the repeating columns
    RepeatingColsHeaders = List.Cells(1).Resize(1, RepeatingColsCount).Value
    'Get the header names of the normalizing columns
    NormalizingColsHeaders = List.Cells(FirstNormalizingCol).Resize(1, NormalizingColsCount).Value

    strSql = vbNullString
    'loop through each normalizing column
    For NormalizingColsIndex = 1 To NormalizingColsCount
        'Create an individual Select for the normalizing column
        strSql = strSql & " SELECT "
        'Select all the repeating columns
        For RepeatingColsIndex = 1 To RepeatingColsCount
            strSql = strSql & RepeatingColsHeaders(1, RepeatingColsIndex) & ", "
        Next RepeatingColsIndex
        'Select the normalizing column and assign the NormalizedColHeader field name
        'and select the data being counted and assign it the DataColHeader field name
        strSql = strSql & "'" & NormalizingColsHeaders(1, NormalizingColsIndex) & "'" & " AS " & NormalizedColHeader & _
                 ", " & NormalizingColsHeaders(1, NormalizingColsIndex) & " AS " & DataColHeader
        strSql = strSql & " FROM [" & List.Parent.Name & _
                 "$" & List.Address(rowabsolute:=False, columnabsolute:=False) & "]"
        If NormalizingColsIndex < NormalizingColsCount Then
            'Union the Select statements created for the normalizing columns
            strSql = strSql & " UNION ALL"
        End If
    Next NormalizingColsIndex
End With
'Set up the DAO connection
strExtendedProperties = "Excel 8.0;HDR=Yes;IMEX=1"
Set daoWorkSpace = DBEngine.Workspaces(0)
Set daoWorkbook = daoWorkSpace.OpenDatabase(wbSource.FullName, False, True, strExtendedProperties)
Set daoRecordset = daoWorkbook.OpenRecordset(strSql, dbOpenForwardOnly)

'Put the normal data in the same workbook, or a new one.
If NewWorkbook Then
    Set wbTarget = Workbooks.Add
    Set wsTarget = wbTarget.Worksheets(1)
Else
    Set wbSource = List.Parent.Parent
    With wbSource.Worksheets
        Set wsTarget = .Add(after:=.Item(.Count))
    End With
End If

'copy the headers and DAO recordset to the new worksheet
With wsTarget
    .Cells(1, 1).Resize(1, RepeatingColsCount).Value = RepeatingColsHeaders
    .Cells(1, RepeatingColsCount + 1) = NormalizedColHeader
    .Cells(1, RepeatingColsCount + 2) = DataColHeader
    .Cells(2, 1).CopyFromRecordset daoRecordset
End With

'clean up
daoRecordset.Close
daoWorkbook.Close
daoWorkSpace.Close
Set daoRecordset = Nothing
Set daoWorkbook = Nothing
Set daoWorkSpace = Nothing
End Sub

If you break after strSql is created, it looks like this:

SELECT League, Year, 'ATL' AS Team, ATL AS HomeRuns FROM [HR-NL$A1:R110]
 UNION ALL
 SELECT League, Year, 'CHC' AS Team, CHC AS HomeRuns FROM [HR-NL$A1:R110]
 UNION ALL
 ...
 SELECT League, Year, 'MIL' AS Team, MIL AS HomeRuns FROM [HR-NL$A1:R110]

Call it like this:

NormalizeList_SQL_DAO ActiveSheet.UsedRange, 2, "Team", "HomeRuns", False

One pitfall of this SQL version is the mixed data-type issue with the Excel ISAM driver. In this example it converts all the home run counts from numbers to text because of the blanks in the data.

All in all, I think the array approach is better than SQL for this use. The core skill of creating SQL in VBA is a valuable one though, and one I’m glad to be developing.

I updated the Data Normalizer .xls to include this code.