Power Query: Big Comparison
23 April 2020
Welcome to our Power Query blog. This week, I look at comparing data in two large CSV files.
This week, I am looking at some vaccination data available from the WHO: I have two versions of the data, and I want to detect the differences.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image1.png/e774d10cbbb9450fc45efbe51abdf434.jpg)
I begin by extracting my first CSV file to Power Query by using ‘From CSV’ on the ‘From File’ section of the ‘New Query’ option in the ‘Get & Transform’ section of the ‘Data’ tab (that’s a mouthful!).
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image2.png/f32e5a15e2cf9c3e4d2d058458ce054d.jpg)
I find the CSV file and import it.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image3.png/f1140ff857fc3b6f5f97a6a24f4a6fc7.jpg)
I select ‘Transform Data’ so that I may select choose to load it.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image4.png/72aa864d2854c6fefb1083fba0ab5792.jpg)
My data is initially extracted, with the top row automatically promoted to headers. However, I don’t want to use this row as my headers. Thus, I demote this row.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image5.png/36776d1da4d05b45bb5a5d09375f407c.jpg)
I can do this from the Home tab on the ‘Use First Row as Headers’ dropdown. ‘Using Headers as First Row’ demotes the headers accordingly.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image6.png/23912d3b1671861e02bebcd5183f1607.jpg)
I remove the first row using the ‘Remove Top Rows’ option on the ‘Remove Rows’ dropdown.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image7.png/6f49c288a0d88a66b427eaf4ece923d6.jpg)
I promote the new top row into headers using the ‘Use First Row as Headers’ option.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image8.png/b9ee28d90e6b5bc92ea4aeafdad51628.jpg)
I close and load this query as ‘connection only’ since I do not need the overhead of loading it to the workbook.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image9.png/0485ccbc83bdeec1d741bad442a1ea5f.jpg)
I create a similar query for the ‘vaccinations – Copy’ CSV.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image10.png/daf8c4f0259ce428269c0d3d4badd32b.jpg)
I merge my queries using a full outer join so that I have all rows in case any exist in one query but not the other. I use Country to merge my queries.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image11.png/22c6daeb82d7d69ac88f878227e04b28.jpg)
This gives me all rows of ‘vaccinations’ and a column linking to ‘vaccinations – Copy’.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image12.png/a1537847463e660a31158c8032525438.jpg)
I expand vaccinations – Copy, using the column name as a prefix so I can distinguish between my similar columns.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image13.png/917da985be13220165c8d2823e95344f.jpg)
I need a column which will tell me if any of the year’s totals don’t match, so I add a conditional column from the ‘Add Column’ tab.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image14.png/8c3be7af9f73d031acae69ed85a2e148.jpg)
In order to see which rows have differences, I filter on Differences to select those rows where the value is ‘Yes’.
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image15.png/e63c0a4c21afc9afb438aacc09a59317.jpg)
Once I have my rows, I reorder the columns to make it easier to see which columns have differences:
![](http://sumproduct-4634.kxcdn.com/img/containers/main/blog-pictures/2020/power-query/177/image16.png/d082e3477129350b8a2a589156028e63.jpg)
I can now see where the differences occur.
Come back next time for more ways to use Power Query!