Power Query: Blog List – Part 4
20 December 2023
Welcome to our Power Query blog. I would like to get a list of all the Power Query blogs I have posted. Today, I improve the query to extract data from all pages by extracting the last page number from the data.
I started writing this blog series way back in 2016:
data:image/s3,"s3://crabby-images/4c682/4c682d8b44ad907d3d59cc11660073cb5ed297b7" alt=""
There have been many developments since then, some of which I have reported upon. Since I am going to revisit more of the areas that have been improved, I am going to start with a reminder of what we have covered so far. In order to do this, I am going to use Power Query in Power BI, since there are some functions that will help me to access web data, which are not yet recognised in Power Query (Get & Transform) in Excel.
In Part 1, I noted that to get a full list, I am going to need several pages:
data:image/s3,"s3://crabby-images/ea4bc/ea4bc8c65c68a4739ecf0e7c133f205bb1334716" alt=""
I extracted and tidied up the data for the first page:
data:image/s3,"s3://crabby-images/52872/5287290d8a6f3a42da934e343acf7b1dfff741b5" alt=""
In Part 2, I changed the Source step of my query to accept a parameter P_PageNumber, which allows me to get the data from a specific page. I used the following M code:
= Web.BrowserContents("https://www.sumproduct.com/blog/power-query-blogs?page="&Number.ToText(P_PageNumber))
I checked that this still gave the same results for the query:
data:image/s3,"s3://crabby-images/148ac/148aca964801c42f10b2887a960745b003a052f0" alt=""
Last week, I converted this query into a function, and generated a list of page numbers to concatenate my data:
data:image/s3,"s3://crabby-images/a5338/a5338f5f08c4323cb09bb728501f66e0b6d21dd3" alt=""
This time, I start by going back to the function behind the query to review the step that I used to remove repetitive data from Description:
data:image/s3,"s3://crabby-images/298b1/298b151b96c6b4385766b95880d377fe27808673" alt=""
I remove the final step to see what is happening to the data in Description on the earlier rows in my combined query:
data:image/s3,"s3://crabby-images/e0acb/e0acb6dc8289d44e5bf9d4ab0d19f192aef803db" alt=""
Clearly, I haven’t always been as consistent! A better approach would be to replace any repetitive text with empty space. I can do this in the combined query, currently called Query1:
data:image/s3,"s3://crabby-images/49696/49696ac775ce229d6b8d07c0888e670da01c0d3c" alt=""
I could improve my query further by removing the hard coded page number from my list generation and instead, extract the number of pages from the data. To do this, I begin by making a Duplicate of Table1:
data:image/s3,"s3://crabby-images/77690/776904dcda3f8e8d2a18fee46f11c7f3690e56dc" alt=""
I (optimistically) call my new query P_LastPage. I only want to keep the Source step:
data:image/s3,"s3://crabby-images/79b48/79b4873be0d561ae78ca8abdbe71d20ba2396b5e" alt=""
I have highlighted the text I need, and I need the total number of pages from this:
<p>Page 1 of 37.</p>
Now, I need a way of extracting the number I want. My options at the moment appear to be limited:
data:image/s3,"s3://crabby-images/35dba/35dba4c949e763d266aafd4447ce449989cd4880" alt=""
I want to look for text before the delimiter “<p>Page 1 of”, and after the delimiter “.</p>”. To access this functionality, I need to convert the text to a Table:
data:image/s3,"s3://crabby-images/d7282/d7282cff004f35554d8e42b05bab9527f12037dc" alt=""
This makes the data harder to see, but I know the delimiters I need to use:
data:image/s3,"s3://crabby-images/adf1a/adf1af3310fc77c810f413b49204ed3909e393fe" alt=""
On the Transform tab, I choose to Extract ‘Text Between Delimiters’:
data:image/s3,"s3://crabby-images/224bf/224bf3319b370b7c7819604b05f47d6c197d55ad" alt=""
I enter the required delimiters in the dialog:
data:image/s3,"s3://crabby-images/8d887/8d887319287f3b3dcdbd1dceaae22a954c4bbfb4" alt=""
This gives me the data I need, and I change the datatype to ‘Whole Number’ to allow me to use it in the list generation.
data:image/s3,"s3://crabby-images/81949/81949fe36d64231b302004ef52708bcdff0bbc4d" alt=""
Now, I can right-click on and ‘Drill Down’ into this value:
data:image/s3,"s3://crabby-images/84894/84894fc2c0042fb8c56ffdac33019e54602bd2bf" alt=""
This gives me my parameter P_LastPage:
data:image/s3,"s3://crabby-images/fb54e/fb54eef47c56a03c54da8a3e7828d9b7bf5765c7" alt=""
I have renamed my main query to Extract Blogs. I need to change the current Source step:
data:image/s3,"s3://crabby-images/c21f1/c21f136e4cd4edbadbf1124c1de8b0b02b629079" alt=""
Instead of the hard-coded value 40, I use P_LastPage:
data:image/s3,"s3://crabby-images/3dbff/3dbff7a4a9e0b649b5f424ed1bbdc287bcadd754" alt=""
Now I have the exact number of pages, I no longer need the ‘Filtered Rows’ step which was removing empty rows; therefore, I delete it and ignore warnings about deleting an intermediate step:
data:image/s3,"s3://crabby-images/6cce2/6cce2248495872eb5e94a5c2827bf4cfacd4545d" alt=""
My query is ready to load, so I select all my data and ‘Detect Datatypes’ from the Transform tab:
data:image/s3,"s3://crabby-images/6aca2/6aca22924756faa49b97697731a6075ce72b87e6" alt=""
Now the list is complete:
data:image/s3,"s3://crabby-images/a6080/a6080bba68c5eac4f0b8725b3d7f5c0fbdcb36f0" alt=""
Next time, I will look at why this cannot be achieved in Power Query for Excel.
Come back next time for more ways to use Power Query!