Power Query: Blog List – Part 2
6 December 2023
Welcome to our Power Query blog. I would like to get a list of all the Power Query blogs I have posted. Today, I extend the query to extract data from all pages.
I started writing this blog series way back in 2016:
data:image/s3,"s3://crabby-images/c3f46/c3f46fa89e66eb3aca4a966ecbeb64a454cd98bd" alt=""
There have been many developments since then, some of which I have reported upon. Since I am going to revisit more of the areas that have been improved, I am going to start with a reminder of what we have covered so far. In order to do this, I am going to use Power Query in Power BI, since there are some functions that will help me to access web data, which are not yet recognised in Power Query (Get & Transform) in Excel.
Last week, I noted that to get a full list, I am going to need several pages:
data:image/s3,"s3://crabby-images/2d37a/2d37a8d7a2304b0a4480216f3366575f7ec374d5" alt=""
I extracted and tidied up the data for the first page:
data:image/s3,"s3://crabby-images/ad665/ad6653f8b8babbe82b44d9e7340f77eda9391c55" alt=""
Now I am happy with this page, I shall go back to the website to check out the URL for the second page:
data:image/s3,"s3://crabby-images/8544b/8544b58f1fce63114c9e5b0791cf67e489640aea" alt=""
If I compare the URL I used for the first page:
https://www.sumproduct.com/blog/power-query-blogs
to the URL for the second page:
https://www.sumproduct.com/blog/power-query-blogs?page=2
I can see I have more characters, i.e. “?page=2”
If I go back to the query I created last week, and look at the source step:
data:image/s3,"s3://crabby-images/d01fa/d01fa89c68c91e60c5d48d2c27b462b1a5527161" alt=""
It is reasonable to assume that I could add the characters “?page=1” to this, and still get the same results. I try this:
data:image/s3,"s3://crabby-images/9478f/9478fc80e62cc52b01aa080e6d80d03f00ac6a20" alt=""
As I expected, the results are the same. However, I want to extract all the pages, therefore, I am going to use a parameter for the page number. On the Home tab, I can create a ‘New Parameter’ from the ‘Manage Parameters’ dropdown:
data:image/s3,"s3://crabby-images/97f77/97f7758369673b499e46adf738a5925fe7dcbae4" alt=""
I create P_PageNumber:
data:image/s3,"s3://crabby-images/912f9/912f916c7a16b7aff699b11c93fc3096a903eb0e" alt=""
I make this a ‘Decimal Number’ (since I do not have the option of making it a whole number) and give it an initial value of 1. If I use this parameter in the Source step of my query, I should see no change to the data extracted. Instead of the current M code:
= Web.BrowserContents("https://www.sumproduct.com/blog/power-query-blogs?page=1")
I need to use the following code:
= Web.BrowserContents("https://www.sumproduct.com/blog/power-query-blogs?page="&Number.ToText(P_PageNumber))
I need to use the function Number.ToText() for P_PageNumber, because the file location string is a character datatype, and P_PageNumber is a number. As before, I can test that this works by checking that the results are the same:
data:image/s3,"s3://crabby-images/13b43/13b4343e43aab11372eefca06511b2133e16d92d" alt=""
Next time, I will convert this query into a function and generate a list of page numbers to concatenate my data.
Come back next time for more ways to use Power Query!