VERSICH

Understanding Merging Methods in Power Query (Power BI)

understanding merging methods in power query (power bi)

Intro

In this article, we'll explore the different merging methods used in Power BI’s Power Query. Grasping these merge types is essential when combining two tables, as the selected merge type affects the output significantly. Power BI offers six types of joins:

  • Left Outer Join

  • Right Outer Join

  • Full Outer Join

  • Inner Join

  • Left Anti Join

  • Right Anti Join

The diagram above illustrates how tables behave during these joins. Before diving into each join in detail, let’s clarify some terminology related to the different types of joins.

Left and Right

As illustrated with the two circles, each type of merge signifies two tables in Power BI. This is why we refer to these merges as actions between a Left and a Right table. The Left Join emphasizes that the data from the Left table is predominant, so after the join, we'll include all records from the Left table and some from the Right table. Conversely, in a Right Join, all records from the Right table are included, with only a selection from the Left table.

You might be curious about the interface in Power Query. In the merge menu, the Left table appears first, while the Right table follows below. For instance, in the screenshot provided, the employee table functions as the Left table and the Sales table acts as the Right table.

Inner and Outer

The Inner Join is represented by the overlap area of the two circles in the Venn diagram. This join targets rows with matching values from both tables, only returning results when there’s a match. Any rows lacking matching values will not be included.

On the other hand, the Outer Join encompasses the outer parts of the circles. Unlike the Inner Join, it does not exclude rows that lack matches. It retains every row and displays null for those without a counterpart.

Anti

The Anti Join works oppositely to the regular joins. Instead of yielding rows with matching values between the two tables, it only returns rows that lack matches. With the terminology clarified, let’s take a closer look at each join type.

Left Outer Join - Power BI

According to Power BI, a Left Outer Join entails “all from first and matching from second.” This means it returns every available row from the Left table alongside matching rows from the Right table. This is the default join type in Power Query and may feel familiar if you've used VLOOKUP in Excel.

To further understand this concept, let's assume we have two tables in Power BI. We can merge them utilizing EmpID, a common column in both tables. Once we select these columns, Power BI will identify the number of matching rows-here, 78 rows from the Sales table correspond with EmpID found in the Employee table.

After selecting OK, we’ll generate a new column where we can expand additional fields. In this case, we would choose EmpName, EmpDesignation, and EmpCountry, which will now integrate into the Sales dataset. Consequently, we’ll incorporate EmpName, EmpDesignation, and EmpCountry into the Sales table. With 78 out of 100 rows containing matching values, we will see populated values in the expanded columns. The 22 rows that don’t match remain in the final table but display as null.

Right Outer Join - Power BI

Power BI describes a Right Outer Join as “all from second, matching from first.” This operates as the inverse of the Left Outer Join, yielding every row from the Right table, along with only the matching rows from the Left table.

Let’s apply this merge type to the identical two tables. Merge the Sales table (Left) to the Employee table (Right) based on EmpID using a Right Outer Join. Notice how Power BI now reflects 78 rows out of 98 instead of the previous 78 out of 100. This adjustment occurs because we’re now considering the Employee table (Right) as our primary table. When we expand the rows, the results change accordingly.

Full Outer Join - Power BI

You can view a Full Outer Join as a blend of both Right and Left Outer joins. Power BI describes it as “all rows from both.” This join returns every row from both tables, including both matching and non-matching rows. This means that we can expect an increase in the total row count after the join.

As for the null values, we will witness all resulting nulls derived from the Right and Left Outer joins, combined together. Merging the Sales table (Left) with the Employee table (Right) on EmpID using a Full Outer Join yields the results shown in the screenshot. We see 78 from the Left table plus 78 from the Right table, with null values reflecting non-matching rows since a full outer join returns everything.

Inner Join - Power BI

Unlike Outer Join, which includes all rows, the Inner Join filters out nulls automatically. Imagine executing a Full Outer Join and then eliminating all resulting nulls-this aligns perfectly with the output of an Inner Join. Power BI defines a Full Inner Join as “only matching rows.”

This join exclusively returns rows that match between both tables, ensuring there are no null entries in the final result. Merging the Sales table (Left) with the Employee table (Right) on EmpID through Inner Join showcases no records with null values in the resultant dataset.

Left Anti Join - Power BI

Power BI interprets a Left Anti Join as “Rows only in first.” This join returns all rows from the Left table which do not have corresponding matches in the Right table.

When merging the Sales table (Left) to the Employee table (Right) on the basis of EmpID through a Left Anti Join, we gather the results reflected in the screenshot, which shows that 78 of 100 rows match from the Sales table, hence the remaining 22 rows are returned by this join.

Right Anti Join - Power BI

Power BI describes a Right Anti Join as “Rows only in second.” This join retrieves all rows from the Right table that lack matches in the Left table. Merging the Sales table (Left) and the Employee table (Right) using the full outer join shows 78 of 98 rows matching from the Sales dataset, meaning the remaining 20 rows will be captured by this join type.

Bonus: Fuzzy Merge - Power BI

Apart from standard value matching, Power BI allows for approximate matches through fuzzy merging. You can choose to match records with 50%, 70%, or even 90% similarity. Simply tick the box in the merge table dialog to enable fuzzy merge options.

Various configurations can be adjusted when utilizing fuzzy matching, including setting the Similarity Threshold, disregarding case, matching by combining text segments, and utilizing a transformation table.

For instance, consider two tables: a City Table and a Population Table. When merging these tables using a regular merge compared to fuzzy matching:

Normal merge output: This method only matches 2 records, ignoring Paris and New York due to their alternate spellings as Pari and New york in the Right table.

Fuzzy matching output: The fuzzy approach succeeds in recognizing these variations and returns 4 matching records.

Conclusion

I trust that this article has been helpful and engaging! Enjoy your journey of merging datasets!