Eugene Lebedev is a Power BI specialist by trade. Before establishing Versich in 2021, he developed Power BI reporting for Autodesk. His reports were employed by executives at Autodesk, including VPs of Finance and C-suite members. Through Versich, Eugene has created dashboards for prominent clients like Google, Teleperformance, Delta Airlines, and over 200 other clients globally.
Transforming your data can significantly help in achieving your business objectives. Our areas of expertise include Data Analysis, Power BI, Google Data Studio, and Tableau. Power Query offers robust tools that enable users to sort and modify their datasets, making them easier to analyze and utilize. If you're embarking on your Power BI journey, understanding Power Query is crucial. This is the platform where all transformations take place, allowing you to work efficiently and produce insightful visualizations quickly. In this article, you’ll explore various functionalities within Power Query and how to implement them on your dataset.
Step 1: Retrieve or Alter Data
You can access Power Query in multiple ways within Power BI. Upon opening your Power BI file, you can either select “Get Data” or “Transform Data” if your data is already loaded. For this tutorial, we will click “Get Data,” which will present a list of data sources available. We are using the Text/CSV connector to load Netflix titles from the Kaggle dataset. This resource allows users to download any datasets they require for free. [Kaggle Dataset Link](https://www.kaggle.com/datasets/shivamb/netflix-shows)
Step 2: Import Dataset
Next, we'll click on “Netflix titles” to load it into our Power BI file. A window will open, giving us a preview of our dataset. Instead of directly loading the data, we will transform it, which will direct us to Power Query. This is where all modifications will be made during this tutorial.
Step 3: Features, Menus, and Capabilities
Before proceeding, it's essential to familiarize yourself with the various menus and sections on the screen. To the left, you'll see a list of queries representing all the tables you've loaded. In the center, a preview of your data is displayed, allowing you to scroll both horizontally and vertically to assess its appearance. The right section contains the Query Settings, Properties, and Applied Steps.
For instance, Applied Steps shows the transformations you've carried out on your data. By clicking ‘Source’ under “Applied Steps,” you can review the alterations.
The ribbon at the top groups functionalities related to data transformation. In this grouping, you will find options like “home,” “transform,” “add column,” “view,” “tools,” and “help.” Under “View,” you can choose what features to display within Power Query. For example, the formula bar shows the text of the actual formulas related to your data transformations, which are written in a language called “M.” While you may not use this language frequently, it’s good to be aware of its existence.
You can also monitor column quality, identifying blanks or errors in your dataset. Column distribution lets you see various attributes, such as the number of movies versus TV shows, while the column profile provides a more detailed assessment.
Retain or Eliminate a Column or Row
The left section allows for managing the tables in your file; however, let's direct our attention to the options on the right. One of the primary functions here is “Choose columns” which lets you load only the necessary columns into your Power BI files.
This feature allows you to eliminate unnecessary columns. For example, if you decide you do not need the cast, description, or rating, you can unselect them and click “OK” to remove those columns from the dataset. A new step titled ‘Removed columns’ will appear, showing the columns you've deleted. You can also undo this step or opt to remove specific columns, such as the director column, by selecting it and clicking “Remove column.” To select multiple columns, hold the Ctrl key and click on the desired columns, then click “Remove column.”
Additionally, you can choose to “Keep rows” or “Remove rows.” Under “Remove rows,” options are available to eliminate errors or duplicates, while the “Keep rows” provides the option to retain errors or duplicates based on your needs. For filtering, click the arrow next to the “director” column, select “blank” from the dropdown, and confirm by clicking “OK.” This adds a filter icon to the column. To remove a filter, click “Clear Filter” after selecting the filter icon.
Another available function is to “Sort columns” alphabetically, either in ascending or descending order from the ribbon menu. It’s also worth noting potential data issues in your dataset. The top two rows are not formatted correctly, showing country names in the “directors” column and dates in the “cast” column. Since this is erroneous data, we will need to eliminate the first two rows. Click “Remove rows” in the menu, choose “Remove top rows,” input “2” in the box, and hit “OK.” This action cleans up our dataset.
Split Columns
Next, let’s explore the “Split columns” functionality. Clicking this option opens a selection of methods to split your column based on conditions, such as ‘By Delimiter’, ‘Number of Characters’, or ‘By Positions’. The first three methods are generally the most frequently utilized.
I’ll walk you through a demonstration using the column labeled “Duration.” Select ‘By Delimiter’ within “Split Column.” This option allows you to define a specific symbol to split on, such as a comma or space. Next, decide whether you want to split at the ‘left-most delimiter,’ ‘right-most delimiter,’ or ‘each occurrence of the delimiter.’ We’ve selected ‘each occurrence’ and clicked “OK.” Now, we see two columns for duration; the left column displays the number and the right column shows the text. Let’s rename the left column to ‘Length’ and the right one to ‘Minute/season’ for clarity.
Set Data Types
Now, let’s learn how to assign data types in Power Query. This option can be accessed from the top ribbon, where you can choose which columns to assign data types to. The possible data types include decimal number, whole number, percentage, date/time, duration, text, etc. For instance, if you wish to change the release year from a number to text, select the “release year” and choose “text” as the data type. This changes the data icon from a number (123) to text (ABC).
The data icons indicate whether your data is text or numeric, along with a calendar icon representing dates; these are the icons you are most likely to encounter in Power BI.
Change Values vs. Substitute With
Next, we need to understand how to replace specific values. Select the Minute/Seasons column, then navigate to the “Replace values” option on the top ribbon. Prompt boxes will open for ‘Value to find’ and ‘Replace with’. We want to replace ‘m ’ with ‘M’ in the Minute/Seasons column. This helps maintain the capitalization for “Seasons.” After making the change, click "OK," and the update will be reflected as shown on the screen.
Conclusion
Power Query offers a fantastic array of functionalities to work with! Another important tool in Power BI is the Merge Queries and Append Queries, but they require a more detailed explanation. Look forward to another tutorial focusing on data transformation steps using Power Query. I hope you found this tutorial helpful and informative!
