Table Join Operation in Excel

The interactivity of Excel makes it a very handy tool for data analysis. But Excel excels only at numeric analysis. Unfortunately, tabular/set analysis is not inherently supported in Excel and you have to jump through hoops to do that. I’m posting the source code for a VBA subroutine (Excel Macro) that performs one of the most common tabular operations – a Table Join. The subroutine here takes 2 tables in the form of ranges, and the name of the join field as input and produces a new worksheet with the joined table as output.

Join Operation

To illustrate the join operation, let’s look at an example. Let’s say both I (Matt)  and my friend (Heather) like movies. We have seen a handful of movies lately and want to figure out which movies we have both seen and how our preferences for compare against each other. So, here are 2 lists:

moviematch_pre

With a list of 14 movies, this problem is still tractable, but add another few movies and it becomes time consuming and undesirable. By running the code in this article, you get the 2 tables aligned and merged into this table instantly:

moviematch_post

The rows in which the Movie column is empty under Heather’s list correspond to movies that Matt knows about but Heather doesn’t. Similarly, rows where Matt’s Movie column is empty indicate movies Matt doesn’t know about, but Heather does. The movies that both share on the list are the ones where the Movie column is filled for both. You can also easily compare the movie preferences. For example, it’s easy to see that Heather and Matt both liked Rocky but both disliked Code Name: The Cleaner. Heather liked Sound of Music, but Matt didn’t. And so on.

Running the Script

If you are copying and pasting the code into your worksheet, you have to do some (minimal) extra work to run the subroutine.

  1. Open the VBA Editor from Excel (Alt + F11)
  2. Record a new Macro (what you record doesn’t matter – you are going to replace the macro code)
  3. Open the Module1 module from the Project panel in VBA Editor.
  4. Copy the contents of the subroutine JoinTables as listed below.
  5. Select Tools->References in the VBA Editor window. Scroll down to Microsoft Scripting Runtime, and click the checkbox. Click OK to close the dialog box.
  6. Replace the contents of your newly created macro with this code: JoinTables Range1, Range2, "MyJoinFieldName".
  7. Replace ‘Range1’ with the range for the first table, Range 2 with range for 2nd table. e.g., in the example above, Range 1 is ‘Worksheets(“Sheet1”).Range(“A3:C17”)’, Range 2 is ‘Worksheets(“Sheet1”).Range(“E3:G17”)’
  8. Make sure you have a header row for the 2 tables and that the header row is part of the range.
  9. Replace ‘MyJoinFieldName’ with the name of the header column on which you are trying to join the table. In the example above, that column is called ‘Movie’. The 2 column names should match exactly (no extra space, no difference in capitalization).
  10. Run the newly created and replaced macro from Tools->Macro->Macros->(Select proper macro)->Run.

If you want to run the subroutine multiple times, make sure you rename the output sheet something other than ‘Joined Table’.

You can also transfer the subroutine and the helper macro into a module file and import the module whenever you need the functions.

Code

Sub TestRun()
'
' TestRun Macro
' Macro recorded 8/16/2009 by Rudrava Roy
'
 
    JoinTables Worksheets("Sheet1").Range("A3:C17"), _
		Worksheets("Sheet1").Range("E3:G17"), "Movie"
End Sub
 
 
Public Sub JoinTables(rngTable1 As Range, rngTable2 As Range, _
	strJoinField As String)
    ' Existing tables passed as range arguments
    '   Field names (headers) to join on passed as strings
    '   First row must be table headers
 
    ' Measure length of both tables choose long table (A) and short table (B)
    Dim rngA As Range, rngB As Range
    If rngTable1.Rows.Count >= rngTable2.Rows.Count Then
        Set rngA = rngTable1
        Set rngB = rngTable2
    Else
        Set rngA = rngTable2
        Set rngB = rngTable1
    End If
    ' Create new sheet with headers from longer table (A) 
	'	followed by shorter table (B)
    Dim newSheet As Worksheet
    Set newSheet = Worksheets.Add
    newSheet.Name = "Joined Table"
    rngA.Rows(1).Copy (newSheet.Cells(1))
    rngB.Rows(1).Copy (newSheet.Cells(1, rngA.Columns.Count + 1))
    ' Locate join columns in both tables
    Dim cellB As Range
    Dim cellA As Range
    Dim rngJoinColB As Range
    Dim tableAJoinCol As Integer
    For Each cellA In rngA.Rows(1).Columns
        If cellA.Value = strJoinField Then
            tableAJoinCol = cellA.Column - rngA.Columns(1).Column + 1
        End If
    Next
    For Each cellB In rngB.Rows(1).Columns
        If cellB.Value = strJoinField Then
            Set rngJoinColB = rngB.Columns( _ 
				cellB.Column - rngB.Columns(1).Column + 1)
        End If
    Next
    ' Resize join column to exclude header row
    Set rngJoinColB = rngJoinColB.Offset(1, 0).Resize( _
			rngJoinColB.Rows.Count - 1, _
            rngJoinColB.Columns.Count)
    ' Create dictionary of table B on join column
    Dim dictTableB As Dictionary
    Set dictTableB = New Dictionary
    Dim rngTmp As Range
    For Each cellB In rngJoinColB.Rows
        Set rngTmp = rngB.Rows(cellB.Row - rngB.Rows(1).Row + 1)
        dictTableB.Add Trim(cellB.Value), rngTmp
    Next
    ' Resize ranges to exclude header rows
    Set rngA = rngA.Offset(1, 0).Resize(rngA.Rows.Count - 1, _
            rngA.Columns.Count)
    Set rngB = rngB.Offset(1, 0).Resize(rngB.Rows.Count - 1, _
            rngB.Columns.Count)
    ' Iterate through each row in A
    Dim i As Integer
    i = 2 ' row to start inserting at
    For Each cellA In rngA.Rows
        ' Copy row to new table (J) as new row
        cellA.Copy newSheet.Cells(i, 1)
        ' If join column of row from A has dictionary match, 
        '       copy row from B & mark row 'done'
        If dictTableB.Exists(cellA.Columns(tableAJoinCol).Value) Then
           dictTableB.Item(cellA.Columns(tableAJoinCol).Value).Copy _
				newSheet.Cells(i, rngA.Columns.Count + 1)
           dictTableB.Remove (cellA.Columns(tableAJoinCol).Value)
        End If
        i = i + 1
    Next
 
    ' Iterate through each remaining row in B
    Dim tmpRow
    For Each tmpRow In dictTableB.Items
        ' Copy row to new row in J
        tmpRow.Copy newSheet.Cells(i, rngA.Columns.Count + 1)
        i = i + 1
    Next
 
    ' Resize all columns in new sheet
    For i = 1 To (rngA.Columns.Count + rngB.Columns.Count)
        newSheet.Columns(i).EntireColumn.AutoFit
    Next
End Sub

Add a Comment

Your email address will not be published. Required fields are marked *