The Best Min Function, I Think

A few weeks ago I wrote a Min array function to determine the minimum of a subset of items in a table. It was not the best Min function, in fact it didn’t work. I’d based it on the Max function I’d written moments earlier – a rookie mistake – and it resulted in zero when it shouldn’t have. I fixed it, but the fix was ugly. Then I realized I could use an If statement in an array formula, which helped a lot. Then I read this informative and lively post.

I’ll summarize what I learned, and propose my own Best Min Function, using the interactive workbook below.

The Workbook

The table on the left contains record high and low Fahrenheit temperatures from various cities around the world, listed by continent. (Column B contains the countries. You can unhide it.)

The formulas are in the table in the top right. They all look for the lowest of the record lows by continent.

Below the formulas are a couple of cells which trigger a couple of pitfalls of the Min function. Below that are text versions of the formulas.

The Formulas

The first formula, “multiplication,” is my original failure. The conditions (record type and continent) and the record lows are all multiplied:

{=MIN((tblRecLows[Record Type]="Low")*(tblRecLows[Continent]=H$1)*
(tblRecLows[Record]))}

As described in the post linked above, this only works for Min if at least one value is equal to or less than zero. Multiplying arrays of conditions like this will always yield some zeros – whenever all the conditions aren’t met – and so the minimum can never be higher than zero.

This list low temps is illustrates the problem well. For some continents the record low is negative, but for others it’s positive. You can see that this formula fails for continents where the record low is positive. For example Africa should show 30 degrees, but it shows zero.

My fix was to force all the numbers in the table to be negative, by subtracting the maximum record temperature from each number, find the minimum, and then add the maximum back. This eliminates the zero problem…

{=MIN((tblRecLows[Record Type]{="Low")*(tblRecLows[Continent]{=H$1)*
(tblRecLows[Record]-(MAX(tblRecLows[Record])+1)))+(MAX(tblRecLows[Record])+1)}

… but it’s ugly. And anyways, both multiplication versions fail if there is any stray text in your list. I noticed this because Florence had “NA’s” for both high and low records. The multiplication chokes on the text value, just as it would on 2 * “NA”, and the formula fails. You can see this by changing the “Has Text” in the “Potential Pitfalls” section to True. Both formulas yield #VALUE.

So, as faithful readers of DDOE already knew – at least if their memories are better than mine – If statements are the way to go. The “nested If” and the “Elias one-if” versions above work because they return an array filled either with temperatures that meet all the criteria, or Falses. The Min function ignores the Falses, and so returns the correct minimum.

Now for the second pitfall: Empty cells in your list are a problem because an IF statement cannot return a null. It converts blank cells to zeroes which, you guessed it, become the result of the Min function if there’s no lower number.

You can see this by changing the “Has Blanks” in the “Potential Pitfalls” section to True and following the instructions. You’ll see that the “nested if” and “Elias one-if” both fail for Africa, and will for any warm continent with a blank cell.

The proposed “Best Min” solves this problem by adding a criteria of non-blank cells. (It also eliminates the “>0” check of the multiplied conditions from Elias’s version as it seems to evaluate to the same thing.)

{MIN(IF(((tblRecLows[Record Type]="Low")*(tblRecLows[Continent]=H$1)*
(tblRecLows[Record]<>"")),tblRecLows[Record]))}

There is a runner-up at the end of the list: a GetPivotData formula looking into a pivot table, or perhaps just a pivot table itself (you’ll see it if you scroll down). It might be my first choice if if didn’t require refreshing.

One last thought

I just realized that the Max function is subject to the exact same false-zero-minimum problems if all values are equal to or below zero, the inverse of the Min issue.

Solving the NPR Sunday Puzzle – #2

This week’s NPR Sunday Puzzle was array formula nirvana.

Take the trees hemlock, myrtle, oak and pine. Rearrange the letters in their names to get four other trees, with one letter left over. What trees are they?

I made this workbook to solve the puzzle:

At the left is a list of trees whose names meet three criteria:

  • The name only contains letters from the four starting trees.
  • The name doesn’t contain more of any letter than are in the four starting names combined.
  • The name isn’t in the original list.

To solve the puzzle, enter TRUE next to trees to choose them. The list of letters on the right will change as you do. Letters shaded blue are still available and the number remaining is shown. If a letter is shaded orange, you’ve overused it and the “remaining” number is negative. Unshaded letters have been used perfectly.

When the number of trees selected is four and one letter is unused, you’ve solved the puzzle. Go ahead and try it! And, oh yeah, there may be something tricky about the list.

Now for the array formulas, all entered with Ctrl-Shft-Enter. There’s one in the “Used” column:

{=SUM(LEN(IF(tblTrees[Use?],tblTrees[Tree],"")))-
SUM(LEN(SUBSTITUTE(IF(tblTrees[Use?],tblTrees[Tree],""),[@[Letters To Use]],"")))}

It compares the length of the names chosen against the length of their names after removing the letter, i.e., how many times that letter is used.

To get the list of tree names my family did some brainstorming, which we augmented with a list from the web. It was sliced and diced using Text to Columns and Remove Duplicates from the Excel 2010 Data menu.

Two more array formulas winnow the list to names that meet our three criteria. They’re on the “Good Trees” tab of the workbook. The “Contains Only Valid Letters” formula determines if a name meets the first criteria:

{=MATCH(0,COUNTIF(tblLetters[Letters To Use],
MID($A2,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1)),0)>LEN($A2)}

This formula assumes that the longest name is 15 letters. The CountIf increments through the letters in the word, counting the occurrences of each letter in the list, which will either be one or zero. It uses Match to find the position of the first zero. If that’s greater than the number of letters in the word it passes the test.

The “Contains Less Of Each Letter Than Total” formula checks our second criteria:

{=ISERROR(MATCH(FALSE,(tblLetters[Count Of Letter]>=
(LEN(A2)-LEN(SUBSTITUTE(A2,tblLetters[Letters To Use],"")))),0))}

This also uses the trick of comparing the length of the name against its length with the character removed, to get the count of that letter in the word. It then assumes that the count of that letter overall is greater than or equal to the count in the word, for each letter in the word. If that assumption is FALSE in any letter position the word is no good.

The “Not In Original List” formula checks … you know. The last one combines the first three and if they’re all true the word makes the cut.

I’d love to hear how other people would do these tests. I’m curious too if others are using Excel to solve puzzles, NPR or otherwise. Do you have a good source of lists? I always just google around, with mixed results.

After I had done all the processing I used the workbook to get the answer. I chose three names – elm, hickory and lemon, and had a, e, k, p and t left over. Four of those letters spelled the last tree. It wasn’t on the list but I added it. You could just put it where pecan is now if you want. If you already did, then “teak” a bough. Yew should be proud.

Speaking of tree-related words, one of my favorite words is “pitch.” It’s arboreal, nautical, romantic, and so much more.

A Prefix Function to Save You From VBA Magic Numbers, Sometimes

Magic Numbers in Formulas

The last post referred to “magic numbers” and the pitfalls of using them in formulas. An example might be this product list, where the quality level is represented by a single digit before the dash in the part number, the “quality prefix.”

lookup formula with magic number

The formula generating the name in C2 is a simple one. It does a lookup of the quality prefix – 1, 2, or 3 – in the “Cutlery Lookup” table, yielding a quality of “Cheap,” “Nice” or “Best.” This is added to the product type, resulting in a name such as “Nice Spork.”

=VLOOKUP(LEFT($A2,1),CutleryLookup,2,FALSE) & " " &B2

One day the proprietors realize these lackluster brand names are a drag on sales. They create new codes and names for the products, adding a 0 to the quality prefix and new quality descriptions to the table. They then print up 2,000 parts lists, failing to notice that the formula is still generating the same lousy names.

new quality prefix - same names
This is due to the magic number “1” in the “VLOOKUP(LEFT($A2,1)” part. It still specifies the length of the quality prefix as 1, meaning the lookup is still seeking the prefixes 1, 2 and 3. (Admittedly, their luck was bad in choosing new codes that didn’t result in #NA and in leaving in the old ones, but they were probably forced to by other bad design practices.)

A more robust approach is to have the formula look for the dash separating the quality code from the rest of the product number, like:

=VLOOKUP(LEFT($A2,SEARCH("-",$A2)-1),CutleryLookup,2,FALSE) & " " &B2

lookup formula with calculated number

This will accommodate different length prefixes.

Magic Numbers and Prefixes in VBA

Prefixes in VBA can also lead to magic numbers. Say you have a workbook with some worksheets identified by the prefix “final” in the name. To process these sheets you might write code like:

Dim ws As Excel.Worksheet
For Each ws In ThisWorkbook.Worksheets
    If Left(ws.Name, 5) = "Final" Then
        ProcessSheet ws
    End If
Next ws

Of course, it’s always dangerous to use the word “final” in a name because nothing’s ever finished. So the next week when the “Really Final” worksheets need to be processed, you change your code to:

If Left(ws.Name, 5) = "Really Final" Then

and the Left function finds no sheet names whose first 5 letters are “Really Final.”

I’ve done something like this more than once, so I wrote a HasPrefix function to end it:

Function HasPrefix(StringToCheck, Prefix, Optional CaseSensitive As Boolean = False) As Boolean
If CaseSensitive Then
    HasPrefix = Left((StringToCheck), Len(Prefix)) = Prefix
Else
    HasPrefix = Left(LCase(StringToCheck), Len(Prefix)) = LCase(Prefix)
End If
End Function

You pass it the string to check, along with the prefix you’re checking for (and whether it’s case-sensitive if you want). It uses the length of the prefix in the Left function, so the length won’t ever be wrong.

I used this recently to check whether a workbook was located in the AppData folder in the user profile, meaning it was most likely opened from an email:

If HasPrefix(ActiveWorkbook.FullName, Environ("LOCALAPPDATA")) then

(Environ is a handy Windows function for checking on your computer’s settings and JP has an informative article on it.)

Two-dimensional Index/Match Formula with Variable-Length Lookup

When I was a housing developer I used Excel for budgets and pro formas. I wrote PPMT formulas, did lots of Goal Seeking, and never used pivot tables. Now I use them all the time, and my favorite formula is an Index/Match. It’s gotten to where I can type a fancy lookup pretty quickly, though my array formulas generally take a bit of hacking. I’m no Barry Houdini, but I do all right.

Yesterday I realized you can do a Match into an array of substrings of cell values. For example, in this baseball stats sheet (I think it’s “rhubarbs” per year) you can match against just the beginning few characters of each column heading, like “SF” or “STL.”

Here’s the formula in cell I3. Note that it’s an array formula, entered with Ctrl-Shft-Enter:

=INDEX(tblRhubarb,
MATCH($G$3,tblRhubarb[YEAR],0),
MATCH(TEXT($H$3,"0"),LEFT(tblRhubarb[#Headers],LEN($H$3)),0))
  • The first part, “INDEX(Table2”, says to index into the whole table. In other words, it’s a two-dimensional lookup.
  • The second part, “MATCH($G$3,tblRhubarb[YEAR],0),” says to look in the year column for a match to the year in G3.
  • The third part, “MATCH(TEXT($H$3,”0″),LEFT(tblRhubarb[#Headers],LEN($H$3)),0))” says to look in the column headers for the team abbreviation in H3.

I wasn’t sure that last section would work. It says to look only in the left part of each header. When you analyze this last part by highlighting it and hitting F9 it looks like…

MATCH(TEXT($H$3,"0"),{"YE","LA"," P","SF","ST"},0))

… which is pretty cool.

The final thing to mention is about this part of the formula:

LEN($H$3)

This says to look at the leftmost number of characters equal to the length of the string in H3. That’s important because not all the team abbreviations are the same length. It would be important even if they were, because you don’t want to rely on “magic numbers” in formulas or code.

If you’re interested in these types of formulas be sure to go back to the top of the post and click the barry houdini link. Not only does he have a great name, he could write a formula that would find your car keys, and it wouldn’t even have to be array-entered.

Solving the NPR Sunday Puzzle

The yoursumbuddy official smartphone has two alarms. One wakes me up at the reasonable hour of 6:23, and one goes off at 9:35 every Sunday with a reminder that NPR’s Sunday Puzzle starts in three minutes. Last Sunday’s lent itself nicely to some Excel fun:

Name two fictional characters – the first one good, the second one bad. Each is a one-word name. Drop the last letter of the name of the first character. Read the remaining letters in order from left to right. The result will be a world capital. What is it?

To solve this, I wanted a list of villains and another of world capitals.

I recently realized that Excel’s Data > From Web feature is easier than copying stuff straight from the web. A lot of web lists have weird html formatting and this feature cuts past much of that. So I sucked in a list of villains from kaijuphile.com

Villain List

… and one of capitals from, that’s right, Wikipedia. Now down to work.

With both lists in a sheet, I added a couple of columns for each. For the villains, the first column gets rid of numbers and the word “The.” The 2nd strips out all villains with names longer than one word, per Mr. Shortz’s instructions. Here’s the formula:

=IFERROR(TEXT(SEARCH(" ",B2),""),B2)

Setup

There’s two fun things here:

1. The IfError part is formed backwards from its normal usage. We actually want the part that returns an error when it doesn’t find a space in the villain’s name: a one-word name.

2. However, this means that all the more-than-one-word villains will return a number – the location of the space. The Text part of the formula fixes that by returning blanks for numbers and leaving strings intact. For example Text(23,””) returns a blank, but Text(“twerp”,””) returns “twerp.” Hey, maybe the answer is Antwerp!

For the capitals, there’s just a bunch of columns, each one lopping off one more letter from the beginning. The villainous name forms the 2nd part of the capital, so if we’re lucky one of the froncated (front-truncated) strings will match a villain. The formula is:

=IFERROR(RIGHT($D2,LEN($D2)-COLUMNS($E:E)),"")

Conditional formatting in column E rightwards turns a cell orange if it matches any of the villain names in column C.  And, sure enough:

Iago

“Santiago” yields “Iago,” who as we all know, flew too near the sun and made a lot of people mad.

This means there’s a good fictional character whose name starts “Sant” followed by one more letter. Hmmm… I guess he wasn’t thinking of the Billie Bob Thornton version.