Index/Sumproduct or Index/Match for Multiple Criteria Lookups?

Index/Sumproduct or Index/Match for Multiple Criteria Lookups?

You can use an INDEX/MATCH formula to look up an item in a list, as explained in this Contextures page. It’s trickier than a VLOOKUP formula, but it can look to the left and adjusts well when data columns are added or deleted. The generic layout of a single-criterion INDEX/MATCH is:

=INDEX(ColumnToIndex,MATCH(ItemToMatch, ColumnWithMatch, 0))

The MATCH section results in a row number that gets applied to the ColumnToIndex.

When looking up items with more than one criteria, I like to use an INDEX/SUMPRODUCT formula, replacing the MATCH part of the single criterion formula with SUMPRODUCT array multiplication, as descibed by Chandoo. Very generically that looks like:

=INDEX(ColumnToIndex,SUMPRODUCT(Multiply a bunch of columns and criteria))

This post started as an explanation of that approach. But in looking at the Contextures “INDEX/MATCH – Example 4″ in the link above, I think it might be better. So this post is now a comparison of the two approaches.

Below is a live, downloadable worksheet with pairs of double-criteria lookup formulas – an INDEX/SUMPRODUCT and INDEX/MATCH in each case. Each row has two lookups: the first returns the number of home runs for that team and year, the second looks up a relative ranking.

I actually broke the formulas into two columns, which is what I’d do in a real project. Columns I and J contain INDEX formulas which refer to the lookup formulas in column H. The INDEX formulas first check if the lookup in column H actually returned a row. IF not “NA” is returned.

I break the lookup formula into two parts like this for a couple of reasons. The first is that if there’s no match the SUMPRODUCT in column H returns 0, which is important to see. The second is there’s often more than one formula referring to the calculated row, so it’s more efficient to only calculate it once and then use it in the different INDEX formulas. In this case the row is used for both the column I and column J INDEX formulas.

Here are the two lookup formulas, in cells H2 and H3, that are used to match the correct row. They are matching 2006 for the year and CHC for the team:

=SUMPRODUCT(($A$1:$A$33=$F2)*($B$1:$B$33=$G2)*(ROW($A$1:$A$33)-(ROW($A$1)-1)))
=MATCH(1,($A$1:$A$33=$F3)*($B$1:$B$33=$G3),0)

At their core they contain the same logic: ($A$1:$A$33=$F2)*($B$1:$B$33=$G2), which multiplies TRUEs and FALSES to return an array of Ones and Zeros. Here’s the SUMPRODUCT version with the above section evaluated by highlighting it and pressing F9. Note the 1 in the 19th position:

The SUMPRODUCT then multiplies that array times an adjusted row number, while the MATCH version finds the position of the number 1 in the array. Both result in row 19.

As mentioned above, the SUMPRODUCT formula returns 0 if no match is found. The INDEX formulas in columns I and J need to deal with that, otherwise they will return incorrect results. For example, if you have something like =INDEX({1,2,3,4,5},0), the result is 1. You might be surprised that it returns anything at all, but the nature of an INDEX formula is that if the row argument is set to 0 it evaluates the entire column. The actual row returned depends on the row of the formula. Confused? Me too.

The MATCH version, on the other hand, returns “#N/A” if no match is found, which is probably what you’d expect, and therefore good. And it’s shorter, also good. The only pitfall is that it’s an array formula and must be entered using Ctrl-Shft-Enter.

I’ve tried to make both sets of formulas more robust by including the headers in the ranges – $A1, $B1, etc. That way if a line in inserted below the header and before the first row of data, the formulas still work. I also want to ensure that the formulas still work if rows are inserted above the table’s header. This causes more complications in the SUMPRODUCT version, in the (ROW($A$1:$A$33)-(ROW($A$1)-1)) part. If you know the table will always start in Row 1 then it can be simplified to ROW($A$1:$A$33)-1. The MATCH version doesn’t have this issue.

Below the first pair of formulas are two more pairs, showing the results if no match is found, and if multiple matches are found. When there’s no match, the INDEX formula result in “NA” in both cases.

If there’s more than one match the SUMPRODUCT version adds together the matched rows. This results in 41 in row 12. Since 41 is outside the ColumnToIndex range, the result is #REF!. The MATCH version returns the first match. Obviously some kind of check for duplicates is good, such as the conditional formatting used here to highlight rows 20 and 21 of the data.

This being Excel, there are other ways to skin this cat. Here’s a post by JP Pinto, with his explanation of these two approaches, plus a LOOKUP version. (There’s a small error in the SUMPRODUCT version which is addressed in the comments.)

So which do I prefer now? Well, even though the MATCH version is simpler in most respects, I tend to avoid array formulas when possible, mainly because I too often forget that they are array formulas, do some editing, don’t hit Ctrl-Shft-Enter and have a moment, or several, of disorientation staring at the #N/A result. So I’ll probably stick with INDEX/SUMPRODUCT.

What do you think?

8 thoughts on “Index/Sumproduct or Index/Match for Multiple Criteria Lookups?

  1. Good post.

    DGET would be perfect for this. =DGET(A1:D33,C1,F1:G2)

    You could easily make it more robust to row additions/insertions, and you can actually do very very very (yes I know that’s three) sophisticated conditional filtering on the fly – the like of which would otherwise need SQL. In fact, you can do stuff with Excels D functions that a pivot table simply won’t accomodate.

    And if you couple it with a data table, you can also allow such an approach to be dragged down to handle changing parameters.

    DGET needs to be coupled with a data table or an alternate approach if you want to drag the DGET function down to accommodate a range of different inputs, on account that it needs to have a well-formed criteria range to filter by. The use of Excel’s “datatables” functionality in this case is handy, because you can ‘swap’ up to 2 inputs of a formula with a designated range of values, so you can see what difference a change in those two input parameters makes to your model output.

    I didn’t think that either function on their own was that much of a deal until I stumbled across this great article at http://support.microsoft.com/kb/282851 about combining them. I had not realised until I read the article that you can use datatables to ‘swap out’ the criteria in a DSUM or any other database function. Very very powerful indeed. So you can set up amazing filtering run directly off the spreadsheet in a way that users can easily interact with, and do stuff that would otherwise require SQL. Blows pivots out of the water, on account that pivots only allow additive filtering. Whereas with a D function you can say I want this AND that, or this BUT NOT that, and what the heck, give me the other as well if it’s a Tuesday night…

    Hui has a great article on DataTables here http://chandoo.org/wp/2010/05/06/data-tables-monte-carlo-simulations-in-excel-a-comprehensive-guide/ (Hat tip to Sam for the link). See the section where he talks about multi input (more than 2) data tables.

    • Thanks for the thorough comment! I haven’t used D functions in years and my forays into data tables have been less than productive. I can see I’ll have to try again, and Hui’s Chandoo article looks like it will help get me over the hump.

      • Jeff and Doug, I think the D functions are totally awesome. I use them at work for data analysis on the fly. Especially, I was able to build a spreadsheet that can sort stuff in 150 different ways just based on D functions. As Jeff mentioned, you can use them to sort some, then change the criteria and sort some more. The technique I used allowed sorting based on output criteria as well. What I mean by this is, I can search for a record which has a certain output. Here is a sample file if you are interested http://pankaj.dishapankaj.com/share its the file with Use of Excel as Database in the name. The data is simulated to protect the confidentiality.

  2. There are other faster ways of generating the arrays of 0’s and 1’s
    Rather than saying (Rng1=Crt1)*(Rng2=Crt2)….etc
    You can try =Countifs(Crt1,Rng1,Crt2,Rng2) etc
    This generates an identical array but is faster.
    So in your example

    SUMPRODUCT(($A$1:$A$33=$F2)*($B$1:$B$33=$G2)*(ROW($A$1:$A$33)-(ROW($A$1)-1)))

    can be replaced with

    =SUMPRODUCT(COUNTIFS($F2,$A$1:$A$33,$G2,$B$1:$B$33)*ROW($A$1:$A$33))

    Its 1 multiplication less but if you are searching on say 5 criteria, this method is faster.

    • Thanks for the tip, sam. I’ve recently read that this is faster but haven’t tried it yet. I shall do so at the next opportunity! I should note that COUNTIFS (and the other IFS functions) aren’t available until XL 2007.

      Do you know how much faster it is?

  3. Pingback: Excel Links – Going to Togo Edition | Chandoo.org - Learn Microsoft Excel Online

  4. Pingback: Excel Links – Going to Togo Edition | Feasibility.pro

Speak Your Mind

Your email address will not be published. Required fields are marked *

To post code, do this: <code> your vba here </code>