Integrating datasets and transformation with Power Query

By Riya Jain & Abhinash Jena on August 22, 2024

Integrating datasets in Power BI is the process of combining data from multiple sources to create a unified view for analysis. It is essential for building comprehensive reports and dashboards that draw insights from diverse datasets such as databases, cloud services, files and web API.

EXAMPLE

Sales data from 2023 and 2024 (download sample data) as separate files can be combined by appending the datasets.

Integrating datasets by appending them to create a continuous table

Appending datasets is essential in data analysis, particularly when working with time-series data or data collected from multiple sources. Appending datasets refers to stacking two or more datasets on top of each other. This is typically done when multiple datasets have the same structure (i.e., the same columns). In many organizations, data is often stored in different files, databases, or even different geographic locations. By stacking up the data sets into a single dataset, crucial insight into new trends, and quarterly or yearly comparisons can be drawn.

Furthermore, as the data grows, appending also helps to scale datasets without having to restructure or redesign the data model. When datasets are appended, it ensures that data is consistent across different time periods or sources.

EXAMPLE

Banks and financial institutions append transactional data to monitor account activities, detect fraud, and assess credit risks over time.

Steps to append datasets in Power BI

Appending dataset in Power BI
Appending dataset in Power BI

Before appending the datasets ensure that the datasets are loaded. They will appear as tables in the Power BI Data pane on the right. Open the Power Query Editor from Data Transform and inspect the tables on the left pane of the Power Query Editor. Ensure that both datasets (2023 & 2024) have the same structure (same column names and data types).

  • In the Power Query Editor, go to the “Home” tab.
  • Click on “Append Queries” and select “Append Queries as New”.
  • Choose the datasets or tables to append and click on OK to proceed.

A new table will be created combining the data from the selected tables as ‘Append1’ by default. Review this new dataset to ensure all rows from the selected datasets are included. Now proceed to data preprocessing steps such as formatting, removing duplicates and filtering.

Integrating datasets by merging them on a common column

Integrating datasets by merging involves combining two or more datasets based on a common key or column with a unique record or row identifier between multiple datasets. This process aligns rows from different datasets based on matching values in specified columns. Merging data from different datasets is typically done to gather additional information from another dataset with the help of some common keys. The relationship between two datasets is also established with the help of these common keys. It is ideal for integrating some related information from different columns of datasets.

EXAMPLE

Adding customer names to sales transactions by merging unique Customer IDs that are common between two different datasets i.e. customer details & sales transactions.

Steps to merge data in Power BI

  • Load the customer details file and ensure that the table is loaded properly.
  • In the Power Query Editor, go to the “Home” tab.
  • Choose “Merge Queries” to create a new column new column with the merged data.
  • In the Merge dialogue box, select ‘customers’ in the two drop-down list.
  • Select the Customer ID column from both tables as the common key. Ensure that both columns are highlighted.
  • Select Left Outer Join (recommended for this case) to include all rows from the Append1 table and the matching rows from the customers table.
  • Click OK to merge, and a new column ‘customers’ will be added to the Append1.
  • Click the expand icon on the ‘customers’ column.
  • Select the columns to include in the final dataset. In this case, check Name and uncheck others as they are not required.
Integrating datasets by merging them on a common column
Integrating datasets by merging them on a common column

Thus, the merge process involves matching rows based on the common key and combining them into a single dataset. The type of join determines which rows are included in the result.

  • Inner Join: Includes only rows that have matching values in both datasets.
Inner join
Inner join
  • Left Outer Join: Includes all rows from the left dataset and matching rows from the right dataset; non-matching rows from the right dataset will have nulls.
Left outer join
Left outer join
  • Right Outer Join: Includes all rows from the right dataset and matching rows from the left dataset; non-matching rows from the left dataset will have nulls.
Right outer join
Right outer join
  • Full Outer Join: Includes all rows from both datasets; non-matching rows from both datasets will have nulls.
Full outer join
Full outer join
  • Anti Join: Includes rows from one dataset that do not have matching rows in the other dataset.

Understanding when to use appending versus merging shall depend on the structure of the datasets and the goals of the analysis.

Grouping rows of a table in Power BI

Grouping involves organizing the data into categories or groups based on one or more columns. Grouping helps to reduce the complexity of large datasets by condensing numerous individual data points into summary groups. Grouping supports drill-down analysis, where the data can be analysed at different levels of detail. Grouping also allows for interactive data exploration through features like slicers, filters, and hierarchies. Furthermore, grouping is the foundation of cluster analysis, a technique used in data mining and machine learning to identify natural groupings within data.

EXAMPLE

Instead of analyzing every single sales transaction, group the data by Product Category to understand which product lines are performing well.

Steps to group the data

  • Select the table to group
  • Identify the columns to group by.
  • In the Power Query Editor, go to the “Transform” tab.
  • Click on the “Group By” button.
  • Select the column(s) to group by.
  • Name the column in “New Column Name” where the aggregated data will be stored.
  • Choose the aggregation equation such as SUM, Count, Min or Max.
  • Click on OK to group the data.
Grouping rows of a table in Power BI
Grouping rows of a table in Power BI

Splitting and slicing columns

Splitting a column is useful when the data is combined in a single column that can be separated into multiple columns. Here the original column will be split into two or more new columns, depending on the desired outcome.

EXAMPLE

A column with full names can be split into two, such as “First Name” and “Last Name.”

Configure the split with these options:

  • Delimiter: Split based on a specific character (e.g., comma, space, dash).
  • Number of Characters: Split the column based on a fixed number of characters.
  • Positions: Split at specific character positions.
  • Lowercase to Uppercase: Split where lowercase changes to uppercase.
  • Digit to Non-Digit: Split where digits transition to non-digits, and vice versa.

Steps to split a column

  • Select the column to split.
  • Go to the “Home” or “Transform” tab.
  • Click on the “Split Column” dropdown.
  • Specify the delimiter.
  • Click “OK” to perform the split.

Slicing columns refers to extracting specific parts of a column, such as retrieving the first few characters or a substring from a column. The desired results can either be put in a new column by adding a Custom Column or extracted from the existing column.

Steps to slice a column

  • Select the Column to Slice.
  • Decide whether to Add a Custom Column or Extract to the existing column.
  • To extract g o to the “Add Column” tab.
  • Click on “Extract” and choose from:
    1. First Characters: Extract a specified number of characters from the start.
    2. Last Characters: Extract a specified number of characters from the end.
    3. Range: Extract a specific range of characters by specifying the starting position and the number of characters.
NOTES

I work as an editor and writer for Project Guru. I have a keen interest in new and upcoming learning and teaching methods. I have worked on numerous scholarly projects in the fields of management, marketing and humanities in the last 10 years. Currently, I am working in the footsteps of the National Education Policy of India to help and support fellow professors to emphasise interdisciplinary research and curriculum design.

I am a Senior Analyst at Project Guru, a research and analytics firm based in Gurugram since 2012. I hold a master’s degree in economics from Amity University (2019). Over 4 years, I have worked on worked on various research projects using a range of research tools like SPSS, STATA, VOSViewer, Python, EVIEWS, and NVIVO. My core strength lies in data analysis related to Economics, Accounting, and Financial Management fields.

Discuss

1 thought on “Integrating datasets and transformation with Power Query”