Power Query: PDF Pandemonium – Part 2
22 September 2021
Welcome to our Power Query blog. This week, I start to transform some data from a PDF file.
The tent business is doing well, and the UK division have plans to expand the workforce. I have a PDF file, and it contains tables for 10 stores. Last week, I imported my data, viz.
The key to making my transformations as immune to change as possible is to keep the data I need rather than delete the data I don’t. Looking at the columns, the easiest way to see if there is any useful data in there is to use the filter icon; Column1 is clearly very useful.
However, Column8 is not:
However, rather than delete Column8, I should keep what I need. On the Home tab, there is an option to ‘Choose Columns’:
I can use this to specify columns I want to keep. It’s much easier than selecting them all for large tables!
I choose to select the first seven [7] columns.
I can see that the heading data from the tables is in Column1, which suggests that transposing my data would be useful. I can do this from the Transform tab.
This swaps the rows and the columns and is much closer to the format I want to see.
I can check the data in my columns again to see which ones I want to keep. However, it is clear that this time the column names will change with the extra text that is present in my source data.
Before I decide which columns to keep, I need some way of identifying them. I will promote the first column to the column headings, which I can do from the Transform Tab.
I choose ‘Use First Row as Headers’:
Power Query has created a ‘Changed Type’ step, but this references column names, so I delete it. I can pick the columns I want to keep in the same way as I did earlier.
I have the data I want to keep, but there are two tables in here: the store data and the pay scales.
I can keep this query, which I will call All Data, and make Reference queries: one for the store table and one for the pay scales table. I can create reference queries from the ‘Home’ tab.
I described the benefits of using reference queries in the blog Reliable References. I call this Reference Query Pay Scales.
I also create another Reference Query, Stores. I will transform Pay Scales next time…
Come back next time for more ways to use Power Query!