Power Query: Cleanse, Tone and Upload
13 December 2017
Welcome to our Power Query blog. This week I look at combining several Power Query functions in order to standardise some incoming data.
For today’s example, I will use my reliable fictional salespeople – but as usual, they have thrown some problems into my data.
data:image/s3,"s3://crabby-images/0226a/0226ae2cda8fe018b99e5a19944e6815dd15f131" alt=""
Derek has helpfully entered the expense codes as I would like to describe them, but Mary and Paul have been less consistent. I would like to change the expense codes to the standard names so that I can calculate totals, but without having to go through and change each entry manually – in a large dataset many people may have used ‘Gas’ instead of ‘Petrol’. First, I need to create a query for my data, which I do on the ‘Data’ tab in the ‘Get and Transform’ section. I opt to create a query ‘From Table’:
data:image/s3,"s3://crabby-images/12e2e/12e2e96a8e17be361460165160dbc01189588445" alt=""
My data is not yet in a table, so this will be done as part of the query creation process; I am prompted for the boundaries and whether I have titles. I can then create my query.
data:image/s3,"s3://crabby-images/eb445/eb44574be07181439cb55fd2c83868bc82c32567" alt=""
From the ‘File’ or ‘Home’ tab, I choose to ‘Close and Load To’, in order to create a connection.
data:image/s3,"s3://crabby-images/c7f8a/c7f8acc732a8c03881f73a150c270d2995fb8ed6" alt=""
I don’t want to load data to my workbook at this point, as nothing has yet changed.
To replace the non-standard entries in Expense Code, I need a reference table. This avoids the need to ‘hard code’ anything (if you don’t know what this means ask a programmer and watch them grimace). Therefore, I right click on my query and look at my options:
data:image/s3,"s3://crabby-images/7e611/7e611e87292b73a823ee4f4d52a1f350214cc1a3" alt=""
I have a ‘Reference’ option, so I choose this. The difference between ‘Duplicate’ and ‘Reference’ is subtle: if I choose ‘Reference’, the new query is based on the results of the first table, rather than replicating it entirely.
data:image/s3,"s3://crabby-images/f584f/f584f345fc0ea13a395fc9b94ef6b12c58529ae9" alt=""
Thus, it looks just like my first table – I rename this table to ‘Expense Codes Substitutes’ so that it’s easier to understand its purpose. I remove the Name column as I am only interested in the expense codes for now.
data:image/s3,"s3://crabby-images/072b4/072b4a61efb6fcbc242f1cc3c74cea656ec3adaf" alt=""
I have too many similar entries, so in the ‘Reduce Rows’ section, I choose the ‘Remove Rows’ dropdown and select ‘Remove Duplicate Rows’.
data:image/s3,"s3://crabby-images/f7924/f79245bd4c175641010a38a2d29775673a65f706" alt=""
My next step will be to decide which cells are standard entries and which are substitute entries. Now that my initial data has been reduced to unique values, I need to go back to Excel to indicate manually which values need to be substituted with a standard value. I close and load to the existing worksheet.
data:image/s3,"s3://crabby-images/9372d/9372d326a85a64d8f9fcd5961401b9ead271886f" alt=""
Next, I need to add a column to my new table called Substitute and reorganise my data so that Power Query can read what and when to substitute for each entered expense code.
data:image/s3,"s3://crabby-images/a453d/a453d26ef2c50082db1582cc1d615189d13c2fcc" alt=""
I have indicated substitutes where appropriate (and deliberately missed one!) I now need to create a new query for my edited table:
data:image/s3,"s3://crabby-images/1ab31/1ab31ab20e1d57689531894927771b6a79cf7288" alt=""
I call the new query ‘Substitutes’ and create it as connection only (since I have not changed it, I don’t need to load it).
Back in the worksheet, I right-click my original query ‘Table1’ and join it to ‘Substitutes’, using the ‘Merge’ option. Since the data in ‘Substitutes’ came from ‘Table1’ this is known as self-referencing.
data:image/s3,"s3://crabby-images/8b1db/8b1db01619bb612b37c5b686d8a1f5a25baedc63" alt=""
I select Expense Code from each table and choose the ‘Left Outer’ join option and click ‘OK’.
data:image/s3,"s3://crabby-images/e1b0b/e1b0b00862224b1f62864c3ae874aaa99b5c5393" alt=""
I can now expand my Substitutes column:
data:image/s3,"s3://crabby-images/0542e/0542ee0d68926eb790ecfb588f296ccd90f831b4" alt=""
I only want the Substitute column from the ‘Substitutes’ table:
data:image/s3,"s3://crabby-images/31f18/31f18abfc0f138792cb357fe24c5819b76682fbc" alt=""
I can now delete the original Expense Code field and rename Substitute to be the new Expense Code… but wait, Mary has no substitute set up for ‘Sundries’ (see I missed one on purpose!). To show that everything can be refreshed, I continue with my deletion and renaming here, and load my query to the same worksheet.
data:image/s3,"s3://crabby-images/40c1f/40c1fcc0aaaac599404bed789d994d0905d9cf8b" alt=""
I manually edit my ‘Substitutes’ table to make sure all the substitutes are populated (including ‘Sundries’), and refresh the data using the ‘Refresh All’ option on the ‘Data’ tab.
data:image/s3,"s3://crabby-images/d4d51/d4d514250aa0c359420bfbca81d72ca3719675cc" alt=""
My data is now updated with standard expense code names. I will still need to maintain the substitute table, but I have no need to trawl through my data to find any non-standard entries that don’t already appear on my substitute table, as an update will add any unrecognised entries to my substitute table. To prove this, I add an ‘A4 folder’ to Paul’s expenses ready to add it to my table:
data:image/s3,"s3://crabby-images/85c11/85c1191e29597f7bd23dd1869b5fc4181705f051" alt=""
Obviously, the update is not quite sure how to treat the extra information in my manual table, but a line has appeared and I need to adjust the data. A new row has appeared for Paul in the final table. I simply update my manual table and refresh again.
data:image/s3,"s3://crabby-images/06af8/06af8f695898729138ca34ace254c74499ceb3b8" alt=""
All the data is now showing the correct expense code, as required.
Want to read more about Power Query? A complete list of all our Power Query blogs can be found here.