Power Query: A Tangled Web Part 2

5 March 2025

Welcome to our Power Query blog. This week, I load the web data I extracted last time to the Power Query editor.

When extracting data from the web version of Power Query, it utilises M code to access the data in a form that can be used in reports. Over the next few blogs, I am planning to examine the M code produced. The data I am going to use to investigate web data is our very own website:

This page contains a table of upcoming courses, along with various buttons and text:

Last time, I used the web connector to examine the data in the Navigator dialog.

I also created more tables using the ‘Add Table Using Examples’ functionality.

I found that I cannot combine data from inside a table and data outside of that table in the same custom table. Therefore, I added two [2] custom tables:

The training contact data has been captured in Table 2 but not in Table 1.

Now, I will use ‘Transform Data’ to view the M code used to capture the data I have chosen. Since I want to load all the data to the Power Query editor, I first enable the ‘Select Multiple Items’ box, and then choose all available data:

I choose to ‘Transform Data’.

I have five [5] new queries, two [2] of which are text (HTML Code and Displayed Text), and three [3] of which are Tables. Let’s look at HTML Code first:

As the name suggests, this is HyperText Markup Language (HTML). The M code is:

= Web.BrowserContents("https://www.sumproduct.com/courses")

The M function Web.BrowserContents() has the following syntax:

Web.BrowserContents(url as text, optional options as nullable record) as text

According to Microsoft’s documentation, this function returns the HTML for the specified url, as viewed by a web browser. An optional record parameter, options, may be provided to specify additional properties. The record can contain the following fields:

ApiKeyName: if the target site has a notion of an API key, this parameter can be used to specify the name (not the value) of the key parameter that must be used in the URL. The actual key value is provided in the credential
WaitFor: specifies a condition to wait for before downloading the HTML, in addition to waiting for the page to load (which is always done). This can be a record containing Timeout and/or Selector fields. If only a Timeout is specified, the function will wait the amount of time specified before downloading the HTML. If both a Selector and Timeout are specified, and the Timeout elapses before the Selector exists on the page, an error will be thrown. If a Selector is specified with no Timeout, a default Timeout of 30 seconds is applied. An example of a selector is the CSS selector "div.ready".

The HTML Code query is giving me the raw HTML data. Let’s see how this differs from the Displayed Text query:

This time, there are two steps. The source step is the same as the HTML Code source step:

The next step is called ‘Extracted Table From Html’, and it does just that:

I will come back to this step in a moment. The final step is a navigation step:

This drills down into Column1 of the previous Table.

The M code for ‘Extracted Table From Html’ is:

= Html.Table(Source, {{"Column1", "BODY"}})

The syntax for Html.Table() is:

Html.Table(html as any, columnNameSelectorPairs as list, optional options as nullable record) as table

The documentation says that this function returns a table containing the results of running the specified CSS selectors (columnNameSelectorPairs) against the provided html. An optional record parameter, options, may be provided to specify additional properties. The record can contain the following fields:

RowSelector: This is a way of filtering the data selected, for example only hrefs could be selected if the RowSelector each [Attributes][href] was used.

In the case Displayed Text, the ‘BODY’ of the Html source is extracted into Column1, accessing the text.

Next time, I will look at the remaining queries.

Come back next time for more ways to use Power Query!

Blog

Power Query: A Tangled Web Part 2

Request a free assessment