Power Query: PDF Pandemonium – Part 2
22 September 2021
Welcome to our Power Query blog. This week, I start to transform some data from a PDF file.
The tent business is doing well, and the UK division have plans to expand the workforce. I have a PDF file, and it contains tables for 10 stores. Last week, I imported my data, viz.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image1.png/e774d10cbbb9450fc45efbe51abdf434.jpg)
The key to making my transformations as immune to change as possible is to keep the data I need rather than delete the data I don’t. Looking at the columns, the easiest way to see if there is any useful data in there is to use the filter icon; Column1 is clearly very useful.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image2.png/f32e5a15e2cf9c3e4d2d058458ce054d.jpg)
However, Column8 is not:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image3.png/f1140ff857fc3b6f5f97a6a24f4a6fc7.jpg)
However, rather than delete Column8, I should keep what I need. On the Home tab, there is an option to ‘Choose Columns’:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image4.png/72aa864d2854c6fefb1083fba0ab5792.jpg)
I can use this to specify columns I want to keep. It’s much easier than selecting them all for large tables!
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image5.png/36776d1da4d05b45bb5a5d09375f407c.jpg)
I choose to select the first seven [7] columns.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image6.png/23912d3b1671861e02bebcd5183f1607.jpg)
I can see that the heading data from the tables is in Column1, which suggests that transposing my data would be useful. I can do this from the Transform tab.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image7.png/6f49c288a0d88a66b427eaf4ece923d6.jpg)
This swaps the rows and the columns and is much closer to the format I want to see.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image8.png/b9ee28d90e6b5bc92ea4aeafdad51628.jpg)
I can check the data in my columns again to see which ones I want to keep. However, it is clear that this time the column names will change with the extra text that is present in my source data.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image9.png/0485ccbc83bdeec1d741bad442a1ea5f.jpg)
Before I decide which columns to keep, I need some way of identifying them. I will promote the first column to the column headings, which I can do from the Transform Tab.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image10.png/daf8c4f0259ce428269c0d3d4badd32b.jpg)
I choose ‘Use First Row as Headers’:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image11.png/22c6daeb82d7d69ac88f878227e04b28.jpg)
Power Query has created a ‘Changed Type’ step, but this references column names, so I delete it. I can pick the columns I want to keep in the same way as I did earlier.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image12.png/a1537847463e660a31158c8032525438.jpg)
I have the data I want to keep, but there are two tables in here: the store data and the pay scales.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image13.png/917da985be13220165c8d2823e95344f.jpg)
I can keep this query, which I will call All Data, and make Reference queries: one for the store table and one for the pay scales table. I can create reference queries from the ‘Home’ tab.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image14.png/8c3be7af9f73d031acae69ed85a2e148.jpg)
I described the benefits of using reference queries in the blog Reliable References. I call this Reference Query Pay Scales.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2021/power-query/251/image15.png/e63c0a4c21afc9afb438aacc09a59317.jpg)
I also create another Reference Query, Stores. I will transform Pay Scales next time…
Come back next time for more ways to use Power Query!