Power Query: Faulty Filtering
7 July 2021
Welcome to our Power Query blog. This week I look at an issue with filtering.
I have some simple tent data which I am going to upload and process in Power Query. What could possibly go wrong?
I upload the data to Power Query using ‘From Table/Range’ on the ‘Get & Transform Data’ section of the Data tab.
I am prompted to choose the data for the table; I take the default area and select ‘My table has headers’.
My very simple table is created. I opt to filter on Tent Type to remove the null values.
This is a simple process; just untick the null values and click ‘OK’.
That all looks fine, so I close and load the data. For this example, I will load the data to the same sheet as the source data.
I can now view the tables side by side.
John has news. He sold a Medium tent in May. I add this to the table and refresh the query:
John is not happy: his tent is missing!
So, what happened? I need to go to my Power Query steps and check why the tent is missing.
The Changed Type step looks fine: all the data is there. Something is wrong with the Filtered Rows step.
Here is the problem. The M code for this step has kept data where Tent Type is “Large” instead of removing null values:
= Table.SelectRows(#"Changed Type", each ([Tent Type] = "Large"))
To be clear, I haven’t asked Power Query to do this. And yet, if I click on the gear icon next to the Filtered Rows step, it shows that only values equal to “Large” have been selected. Since Power Query is designed to process substantial amounts of data, it contains algorithms to create the most efficient code for each step. When I only had values of “Large” and null, it took less processing to keep “Large”, rather than remove null. This is something to watch out for. I need to check that the M code is doing what I asked for. The cog next to a step will present this information in a user-friendly way, and I can change this so that I remove null values.
Clicking ‘OK’ will give me the data I want.
The M code for the Filtered Rows step is now:
= Table.SelectRows(#"Changed Type", each [Tent Type] <> null)
which is what I wanted from the start. Now John is happy!
I could also have achieved this for the extended table by deleting and recreating the Filtered Rows step now that I have three values to choose from.
Now that removing the null values is a more efficient route, Power Query is happy to oblige!
One last point here: there is a way to do this so that Power Query will write the M code in the way that I want for the Filtered Rows step for the original data.
The M code here is:
= Table.SelectRows(#"Changed Type", each [Tent Type] <> null and [Tent Type] <> "")
I didn’t write the M code for this step; I made a different choice from the user interface:
This pushes Power Query to specifically take out nulls and empty entries, which will work as long as I want to get rid of the empty values too.
Come back next time for more ways to use Power Query!