Power Query: Blog List – Part 3
13 December 2023
Welcome to our Power Query blog. I would like to get a list of all the Power Query blogs I have posted. Today, I continue to extend the query to extract data from all pages.
I started writing this blog series way back in 2016:
There have been many developments since then, some of which I have reported upon. Since I am going to revisit more of the areas that have been improved, I am going to start with a reminder of what we have covered so far. In order to do this, I am going to use Power Query in Power BI, since there are some functions that will help me to access web data, which are not yet recognised in Power Query (Get & Transform) in Excel.
In Part 1, I noted that to get a full list, I am going to need several pages:
I extracted and tidied up the data for the first page:
Last week, I changed the Source step of my query to accept a parameterP_PageNumber, in order to get the data from a specific page. I used the following code:
= Web.BrowserContents("https://www.sumproduct.com/blog/power-query-blogs?page="&Number.ToText(P_PageNumber))
I checked that this still gave the same results for the query:
This time, I will convert this query into a function and generate a list of page numbers to concatenate my data. To convert the query into a function, I simply right-click and choose ‘Create Function’:
I choose to call the function fn_ExtractBlogs:
When I click OK, Power Query creates the function, and keeps the underlying query Table1, to enable me to see the steps easily without going into the ‘Advanced Editor’. The query, the function and the parameter are all automatically moved to a new folder (or group) with the same name as the function:
To get all the pages, I will need to generate a list. I right-click in the Queries pane, and choose to create a ‘Blank Query’:
In the query, I create a list from 1 to 40. I have chosen 40 as a number that is more than the current number of pages. This is a placeholder, as I will be using a parameter here.
The M code is simply:
= {1..40}
In the ‘List Tools’ tab, I ‘Convert’ ’To Table’:
I take the default settings:
I rename the first column Page Number, change the datatype to ‘Whole Number’ and add a column using ‘Invoke Custom Function’ from the ‘Add Column’ tab:
Having chosen to use fn_ExtractBlogs, I use Page Number as the parameter value, and click OK.
I can extract the data from each ‘Table’ by using the icon next to the Blog heading:
I select all the data and choose not to ‘Use original column name as prefix’:
I have all the blogs, but since the first page number has the latest blogs, I sort in descending order on Page Number:
This leaves me with the list items at the top that did not link to any pages. I can remove these by filtering on Dateand removing the null values:
This gives me the full list, but I see that the Description is no longer working as I intended:
Next time, I will make some improvements to the process.
Come back next time for more ways to use Power Query!