As we have seen in our introduction, historical pricing inquiries (HPI) can be pulled from reliable data sources like Yahoo Finance and The Wall Street Journal.
We’re going to go ahead and pull this data for IMAX from The Wall Street Journal.
https://quotes.wsj.com/IMAX/historical-prices
and we’re going to see what happens when some of this data is missing and piece it back together using the VLOOKUP() formula.
So, after downloading the historical prices based upon the date range of 06/22/2020 - 06/22/2021, the following spreadsheet is directly exported into Excel. Highlighting column A automatically brings up Excel’s built-in summary statistics dashboard at the very bottom of the screen, and we can instantly see that the count is 1,008, telling us that this column of data contains 1,008 cells. Without the header cell, (A1) , it’s 1,007 dates from (6/22/2020 – 6/22/2021). So, this is our master list of data because it holds the original integrity of the report.
Let’s now say, for example, that some of the data in this list was lost and several rows went missing. How would we tackle the problem of finding these missing rows? This is precisely when the VLOOKUP() function comes in handy. Let’s look at what happens when we receive this data partially (with only 346 rows of data). Let’s find out what happened to the 661 missing rows of data. We know that entire rows of data are missing because if only cells were missing, blanks would show up in the midst of the dataset.
What do we know about the data?
- Dates in column A are in reverse chronological order.
- The rest of the columns B-F are quantitative variables tied in with column A.
So if we have the dates, can we find the corresponding info? Well, almost…
We are going to tell excel to search the data up and down until an exact match is found. The VLOOKUP() tells excel to vertically lookup values in one data set, and find it in another.
- We’re going to go back to the HistoricalPrices (2) workbook and add columns G-K and label them to reference that the data is coming from the HPI Partial worksheet. This is the worksheet that contains only the partial data.
- We’re going to enter the VLOOKUP() function in cell G2:
=VLOOKUP( lookup_value , table_array , col_index_num , [range_lookup] )
where:
lookup_value: the value we want to look up (find)
table_array: the range of data where the lookup value is located
col_index_num: the column # in the range that has the value we are looking for
range_lookup: we want to specify FALSE for an exact match, otherwise, if we omit or put in TRUE , we only get an approximate match
Application of the VLOOKUP()
In cell G2 , we are telling Excel to vertically lookup cell A2 (which is the first value on this sheet) in worksheet ‘HPI Partial’ that spans the range of $A$1:$F$347 (we want to absolute reference this range with the $ sign in the middle to lock in the ROWS ONLY), where the open price is located in column #2, and we want an EXACT match!
Closing the parentheses and pressing enter returns the value of the exact match. Let’s bring this formula in cell G2 down to the bottom, to populate column G with the full range of data. However, in this process, we find that some of the values in column G are returned as #N/A . This tells us exactly where the data is missing.
We can leave the formula as is, but prefer to clean up the #N/A’s for aesthetic reasons, and amend the formula to replace these errors as hyphens (-) . In so doing, we modify the VLOOKUP() function to the following:
IFERROR(VLOOKUP(A2,‘HPI Partial’!A$1:F$347,2,FALSE), “-”)
↑
If there is an error, run the VLOOKUP() function anyway, and fill the #NA’s with “-” .
Now that the formula is modified to account for #N/A errors, as a shortcut to the process, we apply the same formula across columns G-K , remembering to ONLY change the column index numbers (this corresponds to our respective columns of interest within the partial list we are looking up the values from).
Once we run the formula across columns G-K , we can proceed to do a dropdown filter on row 1 , choosing column G as a baseline, unselecting all, and re-selecting ONLY the hyphens. Once the filter is applied, this shows us (in columns A-F ) the exact data that was missing from our partial list. The dashboard at the bottom of the screen also shows us that 661 records were found. In our original problem, we were missing exactly 661 rows of data! We are done!