Occasionally, I need to compare two or more lists of names to determine any matches across the lists. Depending on the sources, the formatting of the lists may be completely different. Even minor differences in formatting can result in a challenging comparison in Microsoft Excel.
If you run a quick Google search online regarding the removal of a middle initial from lists, the top results are elaborate formulas which generally assume the middle names will have an initial and a period. In reality, this is not always the case, and many people don’t have the time or desire to change a long excel formula with multiple nested commands. I will show you the “quick and dirty” way to make this comparison using a combination of functions in Microsoft Excel.
Suppose you had the following names from a list we will call “Source A.”
- John Doe
- Alred B. hitchcock
- Larry F Sanders
- Ray Smith
- Jo-Anna Q Rogers
- B Jones
Then suppose you wanted to see how many names from Source A matched the following names from “Source B.”
- John X. Doe
- Oprah Winfrey
- Bart Simpson
- Quincy Williams
- Jo-Anna Rogers
- Bill F. Jackson
- Alred b Hitchcock
Since there are only 7 names in Source B, the obvious solution would be to compare them manually. However, this is just an example which can easily scale to much larger lists.
Step 1: List your names in Microsoft Excel. That was easy, right?
Step 2: Separate the first, middle and last names using the “Text to Columns” function in Microsoft Excel. Do this by highlighting all of the names in your list and clicking on the “Text to Column” button on the “Data” ribbon. Use the “delimited” option and under the delimiters list of choices, check the “space” option to make spaces serve as the delimiter.
Step 3: After using the “text to columns” function, the names will now be distributed across 2 to 3 columns, depending on if a middle initial was used or not. What I will call the “middle initial column” is shown in Column B of the screenshot. Since some names don’t have middle initials, this column is filled with both middle initials and last names. We want to isolate the rows that only have middle initials by checking the length of the “middle initial column” value. Add a new column, shown in the screenshot as Column “D”, and use the LEN function to measure the length of the “middle initial column” value (or Column B, in my example). The formula, “=LEN(B3)”, is simply returning the number of characters in cell B3. Since a middle initial will have either 1 or 2 characters, depending on if a period is used, we will use this to isolate middle initials.
Step 4: Filter for lengths of just 1 or 2. This will isolate the names with middle initials. Depending on how “sloppy” the data is on your list, experiment with including lengths of 3 and 4. You never know what people are capable of when it comes to data entry…
Step 5: Click on the first result of the filter for middle initials and use the “=” character to set the value to the corresponding last name. In my example, the middle initial for Alred Hitchcock was “B.”, but we want the middle initial to be replaced by “Hitchcock,” so we set the B5 cell value to the value of C5 using the formula “=C5”. Once the first middle initial is changed, drag the corner of the cell down to fill the rest of your results.
Step 6: Where exactly are we going with all of this? Bear with me… Next, you need to combine (or concatenate) the first and last names. Why did we have to go through the agony of steps 3 through 5 if we had first and last names at the end of step 2? Because now, the last names are all in one column, column B in the example, and we can combine them quickly and easily. Use the concatenate function to combine cell values. In my example, you can see that I combined the first name (cell A4) and the “new” last name column (cell B4) to get the new “normalized” name which does not include the inconsistent middle initials.
Step 7: Perform Steps 1-6 on the second list. In my example, we are referring to the second list as “Source B.”
Step 8: After performing the same steps on the “Source B” list of names, we now have two lists of users without any middle initials. Since the format of the two lists is the same – i.e. first name, last name – the lists are now much easier to compare using the VLOOKUP function in Excel. As shown by the screenshot, use the formula “=VLOOKUP(E3,$E$12:$E$18,1,0)” to compare the names on Source A to the names on Source B.
As shown by the final screenshot, there are three matches in the comparison: John Doe, Alred Hitchcock, and Jo-Anna Rogers. Names without a match will show a value of “#N/A” in the VLOOKUP cell.
When performing your own comparison, remember that two or more people may have the same name and different middle initials. For this reason, check any matches, just to make sure there are no “false positives.” Also, remember to check common nick names as the same person may be referred to as “Joe” in one list and “Joseph” in another.
There are no comments yet. Why not be the first to speak your mind.