VERSICH

Prepare and Refine Data Using Power Query Tools

prepare and refine data using power query tools

Introduction

Every Power BI report we build at Versich starts long before the first chart appears on a dashboard. It starts with the data itself, and more specifically, with how clean and consistent that data is before it ever reaches the reporting layer. This is the work that Power Query was built to handle, and it is one of the most underused tools in the Power BI ecosystem.

Power Query is the data preparation engine built into Power BI, Excel, and several other Microsoft products. It lets us connect to almost any data source, reshape that data into a usable structure, and load it into a model that is ready for analysis. We rely on it on nearly every Power BI engagement we run, because the quality of a report is only as good as the quality of the data feeding it.

In this article, we walk through how we use Power Query to prepare and refine data, the core transformations we apply most often, and the practices that keep our queries reliable as data sources change over time. If your organization is dealing with messy spreadsheets, inconsistent data formats, or reports that take too long to build, this is the layer of the process worth paying attention to.

What Power Query Is and Why It Matters

Power Query is an ETL tool, meaning it extracts data from a source, transforms that data into a cleaner shape, and loads it into a destination such as a Power BI data model or an Excel worksheet. We use it as the first stop for nearly all the data that flows into the dashboards we build for clients.

What makes Power Query valuable is that it does this work without requiring a separate data warehouse or a dedicated ETL platform for smaller and mid-sized projects. The tool sits directly inside Power BI Desktop, so we can connect, clean, and shape data in the same environment where we build the final report.

We also value Power Query because every transformation step we apply is recorded and repeatable. When a client adds new rows to a source file next month, our query reruns the same cleaning steps automatically. We are not rebuilding the report from scratch every time the underlying data changes.

There is also a cost dimension to this that clients appreciate once they see it in practice. Manual data cleanup in Excel, where someone deletes a few rows, fixes a date format, or copies values from one tab to another, has to be repeated by hand every reporting cycle. Power Query replaces that recurring manual effort with a query that runs the same way every time, which means the hours spent on data cleanup at the start of a project do not need to be spent again the following month.

Connecting to the Right Data Sources

The first step in any Power Query workflow is establishing a connection to the data. Power BI supports a wide range of source types, and we typically work across several of the following on a single project:

  • Excel workbooks and CSV files stored locally or on SharePoint
  • SQL Server, Oracle, and other relational databases
  • Cloud platforms such as Azure, Salesforce, and Dynamics
  • Web pages and REST APIs for external or public data
  • NetSuite and other ERP exports, which we frequently combine with financial and operational data for our clients

Choosing the right connection method matters as much as the connection itself. We pay close attention to whether a source supports query folding, which means Power Query can push transformation logic back to the source system rather than processing everything locally. This has a direct impact on performance, particularly with larger datasets.

We also think carefully about how a connection should authenticate and how often it needs to be refreshed. A query connecting to a finance system, for example, often needs scheduled credentials configured through the Power BI Service gateway, while a one-time analysis pulling from a folder of CSV files might only need a local connection. Setting this up correctly the first time avoids a common problem we see on projects that were not originally built with Power BI in mind, where a report works perfectly on someone's laptop but fails the moment it needs to refresh automatically on a schedule.

Working Inside the Power Query Editor

Once a connection is established, we move into the Power Query Editor, where the actual data preparation happens. The editor presents data in a familiar table view, and every action we take, from renaming a column to filtering a row, is captured as a step in the Applied Steps pane.

This step-by-step record is one of the most practical features in the entire tool. We can click back through previous steps to see exactly how the data has been transformed, reorder steps when the sequence matters, or remove a step entirely without breaking the rest of the query. For us, this transparency is what makes Power Query suitable for production reporting rather than a one-off cleanup exercise.

Behind every action in the editor is M, the formula language that drives Power Query. We do not need to write M code for most transformations because the editor generates it automatically, but having that code available gives us the flexibility to write custom logic when a transformation cannot be achieved through the interface alone.

A common example is conditional logic that depends on values across several columns at once, or a custom function that needs to run against every row in a folder of files. In those cases, we write the M code directly in the Advanced Editor. Most of the work we do for clients does not require this, but knowing when to reach for it is part of what separates a query that handles routine data cleanly from one that can also handle the edge cases a business inevitably runs into.

Core Transformations We Apply Most Often

While Power Query supports dozens of transformation types, most of the value in a typical project comes from a smaller set of core operations. The table below summarizes the stages we move through on nearly every engagement.

Power Query Stage

What It Does

Why It Matters for Reporting

Connect

Pulls data from sources such as Excel files, SQL databases, SharePoint lists, and web APIs

Gives us one entry point for data that would otherwise live in scattered files and systems

Transform

Filters rows, splits or merges columns, changes data types, and removes duplicates

Produces a clean, consistent dataset before it ever reaches a report or dashboard

Combine

Appends or merges multiple queries into a single table

Lets us bring together data from different departments or time periods into one view

Parameterize

Uses parameters to control file paths, date ranges, or filters

Makes a query reusable across projects instead of rebuilding it each time

Load

Sends the finished table into the Power BI data model or an Excel worksheet

Hands off a reliable, query-ready dataset to the reporting layer

Within these stages, a few specific transformations come up again and again in our work:

  • Removing duplicate rows that result from inconsistent data entry or repeated exports
  • Splitting columns by delimiter, such as separating a full name into first and last name fields
  • Changing data types so that dates, numbers, and text are recognized correctly by the model
  • Unpivoting columns to convert wide, spreadsheet-style data into a long format suitable for analysis
  • Replacing null or blank values with defaults that prevent reporting errors downstream
  • Trimming and cleaning text fields to remove extra spaces or inconsistent capitalization

Combining Data from Multiple Sources

Most of the reporting challenges we solve for clients involve data that lives in more than one place. A sales team might track pipeline activity in one system while finance tracks revenue in NetSuite. Power Query gives us two primary ways to bring this data together.

Merging queries works like a database join. We match rows from two tables based on a common column, such as a customer ID or invoice number, and bring fields from both tables into a single result. We use merges constantly when combining operational data with financial data, since the two often originate from completely different systems.

Appending queries works differently. Instead of joining columns side by side, it stacks rows from multiple tables that share the same structure. This is the approach we use when a client has separate files for each month or each region, and the goal is to combine them into one continuous dataset for trend analysis.

Building Reusable Queries with Parameters

One of the practices we apply consistently is the use of parameters inside Power Query. A parameter lets us define a value, such as a file path, a date range, or a folder location, once, and reference that value across multiple queries.

This matters most when a report needs to move between environments. A query built with a hardcoded file path breaks the moment that file moves or the project shifts to a new server. A query built with a parameter only needs that single value updated, and every dependent query updates with it.

We also use parameters to let business users adjust certain inputs themselves, such as switching between a current year and prior year view, without needing to open the underlying query logic at all.

Keeping Queries Fast and Refreshes Reliable

A Power Query workflow that looks clean in testing can slow down significantly once it runs against full production data. We build in a few habits early to avoid this.

  • Filtering rows as early as possible in the query, so later steps work with a smaller dataset
  • Disabling load on staging queries that exist only to support other queries, rather than appearing in the data model directly
  • Checking for query folding wherever a source supports it, so transformation work happens at the source rather than locally
  • Avoiding unnecessary preview refreshes while building, since each one re-queries the live source

These habits become especially important once a report moves into scheduled refresh on the Power BI Service. A query that takes a few seconds during development can take considerably longer against a live production database, and refresh failures are almost always traced back to one of these issues.

We also build in a routine for testing refreshes before a report goes live, rather than assuming it will behave in production the way it did during development. This usually means running a full refresh against the actual production data volume, not a sample, and checking how long each query takes individually rather than only looking at the total refresh time. When a refresh does fail, having that breakdown available makes it far faster to identify which query caused the problem instead of troubleshooting the entire pipeline from scratch.

Validating Data Quality Before It Reaches a Report

Cleaning data and validating it are not the same step, and we treat them separately. Cleaning addresses formatting and structure. Validation asks whether the numbers themselves make sense once the cleaning is done. We build a short set of checks into most queries to catch problems before they reach a dashboard rather than after a client spots them.

  • Row counts before and after key transformation steps, to confirm a filter or merge has not unexpectedly dropped records
  • Spot checks on totals against a known source, such as comparing a revenue figure in the model to the same figure in NetSuite
  • Checks for unexpected nulls in fields that should always be populated, such as a customer ID or transaction date
  • Reviewing categorical fields for inconsistent labels, such as a region listed as both 'US' and 'United States' in the same column

These checks take relatively little time to build compared to the time they save later. A reporting error caught during development is a quick fix. The same error caught after a client has already presented numbers in a leadership meeting is a much harder conversation, and it is the kind of issue that erodes confidence in a dashboard even after the underlying mistake is corrected.

Why This Preparation Work Pays Off

Clients sometimes ask why we spend as much time on data preparation as we do on the dashboard itself. The honest answer is that a report is only as trustworthy as the data underneath it. A dashboard built on inconsistent data types, duplicate records, or unmerged sources will eventually produce numbers that someone questions in a meeting, and at that point the report loses credibility regardless of how good it looks.

We see Power Query as the layer that protects that credibility. By the time data reaches the model and the visuals, it has already been validated, cleaned, and structured in a way that holds up under scrutiny. That is what allows the reports we build to become tools people actually rely on for decisions, rather than dashboards that get checked against a spreadsheet before anyone trusts the numbers.

This is also the foundation of the broader Power BI consulting and development services we provide, where data preparation, modeling, and report design come together as one engagement rather than separate handoffs.

How We Approach Power Query on Client Engagements

On every engagement, we treat data preparation as its own phase of work rather than a quick step before building visuals. We start by mapping out the source systems involved, then design a query structure that separates raw source connections from the transformation logic applied on top of them. This separation makes it far easier to update a query later without re-doing the entire pipeline.

We also document the transformation steps we apply, particularly on larger projects where multiple team members may need to maintain the model over time. A query that makes sense to the person who built it can be difficult to follow months later without that context.

Our Power BI consulting services cover this entire process, from initial data assessment through to dashboard delivery and refresh setup, so clients get a reporting solution that holds up as their data and their business continue to grow.

You can see examples of this work in our Power BI portfolio, which includes dashboards built across finance, sales, and operational use cases.

Conclusion

Power Query is the part of Power BI that does not get the same attention as a well-designed dashboard, but it is the part that determines whether that dashboard can be trusted. Connecting to the right sources, applying consistent transformations, combining data correctly, and keeping queries fast and maintainable are the practices that turn raw, scattered data into something a business can actually act on.

At Versich, we treat this preparation work as a core part of every Power BI engagement we deliver, not an afterthought. If your team is working with data that is difficult to clean, combine, or trust, we would be glad to talk through how we can help.

Reach out through our Contact Us page to start the conversation.