Data Management Solution to Consolidate 30-Year Longitudinal Research Data

Data Management Solution to Consolidate 30-Year Longitudinal Research Data
Study Participants

15,000+

Study Participants

Parameters Standardised

35,000+

Parameters Standardised

Historical Data Unified

30+yrs

Historical Data Unified

Faster Data Retrieval

3x

Faster Data Retrieval

Background The Client Challenge

Our client is a leading US research university with a long-standing history and a strong reputation in interdisciplinary medical and social science research. Over the past three decades, the university has conducted a large-scale longitudinal study tracking the long-term health outcomes of over 15,000 participants documenting lifestyle habits, clinical measurements, demographic data, and environmental factors across hundreds of follow up cycles.

Despite the enormous scientific value of this dataset, the university faced a critical challenge: all research data had been collected and stored in hundreds of disconnected Excel and CSV files, each coded according to different internal codebooks and conventions. This made meaningful analysis nearly impossible without significant manual effort. The core challenges included:

  • Millions of data entries spread across hundreds of disparate Excel and CSV files with no unified structure
  • The same parameters (e.g. age, BMI, blood pressure) coded differently across files and study cycles making cross-file analysis unreliable
  • No centralized database or single source of truth for the full 30-year dataset
  • Researchers spent hours manually searching and compiling data for each report or analysis request
  • No self-service analytics capability every data query required involvement from the IT team
  • Risk of data loss and inconsistency growing with each new study cycle added to the existing file-based system

The university partnered with Versich bringing deep expertise in healthcare data analytics and research data management to design and build a centralized data management and analytics solution on the Microsoft technology stack.

Our Solution

Versich assigned a business analyst, two data engineers, and a project manager to the engagement. The team worked closely with the university's research leads and statisticians to elicit requirements and design a solution that would serve both technical and non-technical research staff.

Discovery & Data Audit

Discovery & Data Audit

  • Conducted structured interviews with research team leads, statisticians, and IT staff to document requirements for the centralized data system
  • Audited all provided research files hundreds of Excel and CSV files spanning 30 years of study data to understand structure, volume, coding conventions, and inconsistencies
  • Identified that the same parameters (e.g. gender, BMI, blood pressure readings) were coded differently across files and study cycles due to evolving internal codebooks
  • Mapped out all 35,000+ documented parameters, created unified codes, and defined a standardized schema for the target database
  • Documented data mapping rules and parameter definitions to serve as the foundation for the database build and future data loading

Solution Architecture Design

Solution Architecture Design

  • Designed a centralized Microsoft-stack architecture aligned to the university's existing IT infrastructure enabling future scalability and automated data loading
  • Data storage layer Microsoft SQL Server as the central research database, hosting millions of standardized data entries across all study parameters
  • Data integration layer Azure Data Factory for orchestrating data ingestion, transformation, and standardization pipelines from source files
  • Security layer Azure Key Vault for secure credential and access management across all platform components
  • Business intelligence layer Microsoft Power BI (web and desktop) as the primary interface for data exploration, search, and reporting

Data Standardization & Database Build

Data Standardization & Database Build

  • Applied unified coding rules across all 35,000+ parameters mapping legacy codebook values to a consistent, analytics-ready schema
  • Uploaded the full 30-year dataset millions of entries covering demographics, clinical measurements, lifestyle data, and follow-up records into the centralized SQL Server database
  • Implemented detailed data validation checks during loading to ensure accuracy, completeness, and consistency across all historical records
  • Configured native role-based access control in Power BI and SQL Server ensuring researchers access only the data relevant to their study role
  • Connected Power BI web and desktop apps to the centralized database, enabling multifaceted data exploration across all parameters

Training & Knowledge Transfer

Training & Knowledge Transfer

  • Created a comprehensive Power BI user manual covering data exploration, report generation, parameter search, and self-service updates to parameter definitions
  • Included step-by-step instructions for non-IT users to manage access for new researchers and update mapped parameter attributes (e.g. definitions, codes, descriptions) independently
  • Delivered hands-on training sessions for research staff and university IT team to ensure confident adoption of the new platform
  • Provided documentation for future development phases including automated data loading, built-in cleaning mechanisms, and custom analytics features

Business Impact

30 Years of Research Data Unified and Accessible

30 Years of Research Data Unified and Accessible

Millions of research data entries gathered across three decades, previously scattered across hundreds of disconnected files, are now consolidated into a single, structured SQL Server database giving researchers one reliable source of truth for the entire study dataset.

3x Faster Data Retrieval

3x Faster Data Retrieval

Researchers now locate specific participant data, parameter values, or study-period records in seconds using Power BI compared to hours of manual cross-file searching under the previous setup.

Automated Report Generation

Automated Report Generation

Reports that previously required manual data compilation across multiple files for internal review, grant submissions, or external partner sharing are now generated directly from Power BI in minutes.

Self-Service for Non-IT Users

Self-Service for Non-IT Users

With comprehensive user manuals and training, research staff independently manage access permissions, update parameter definitions, and run custom analyses without requiring IT team involvement freeing technical resources for higher-value work.

REVIEWS

What Clients Say About Us

5.0
Full starFull starFull starFull starFull star

We hired Versich to rebuild our analytics stack after an internal project stalled. They came in, assessed the situation quickly, and delivered production-ready Power BI dashboards within weeks. Their DAX knowledge and data modelling skills are exceptional.

Marcus Webb

CTO
5.0
Full starFull starFull starFull starFull star

Versich understood our finance workflows from day one. They built dashboards that connected directly to our ERP and gave our leadership team real-time visibility into cash flow, margins, and budget vs actuals. The quality of the work and the speed of delivery were both outstanding

Priya Nair

Finance Director
5.0
Full starFull starFull starFull starFull star

Before Versich, our reporting was scattered across spreadsheets with no single source of truth. They built us a Power BI environment that connects our warehouse, finance, and sales data in one place. Our operations team now makes decisions in hours instead of days

Daniel Okonkwo

Head of Operations