🏛️ Database Design & Warehousing

Databases That Scale. Warehouses That Answer.

From a clean relational schema to a full dimensional warehouse with automated ETL — I design, build and migrate data platforms on SQL Server, Microsoft Fabric, Databricks, Synapse and Snowflake. The largest build in production tracks 5.6 million records on a Snowflake-schema warehouse, and every pipeline ships with CI/CD and validation built in.

Database & Warehousing Packages

🧩
Database Schema Design

A relational database designed properly from day one — or rescued from a spreadsheet sprawl: normalised tables, sensible keys, indexing strategy and an ERD your whole team can read.

Includes
  • Entity-relationship diagram (ERD) & data dictionary
  • Normalisation to 3NF (denormalised where it pays)
  • Indexing, constraints & integrity rules
  • Production-ready DDL scripts you keep
SQL Server PostgreSQL ERD
Data Warehouse & Dimensional Modelling

A reporting-grade warehouse built on Kimball dimensional modelling — star or snowflake schema, conformed dimensions and history tracking — the same approach behind a live 5.6M-record inventory warehouse.

Includes
  • Fact & dimension design (star / snowflake schema)
  • Slowly changing dimensions (SCD 1/2) for history
  • Aggregations tuned for BI & dashboard queries
  • Source-to-target mapping documentation
Star Schema Kimball SCD
🔀
ETL & Data Pipeline Development

Pipelines that move data reliably every night without babysitting — SSIS, Azure Data Factory or Python, with incremental loads, validation checks and CI/CD deployment through GitHub.

Includes
  • Source extraction: APIs, files, databases, SaaS
  • Incremental / delta loads, not full reloads
  • Row-count & quality validation at every stage
  • CI/CD deployment, alerting & runbooks
SSIS Azure Data Factory Python
☁️
Cloud Migration & Lakehouse

Move an on-premise database or ageing warehouse to a modern cloud platform — Microsoft Fabric, Databricks, Synapse or Snowflake — with Delta Lake architecture and validated, reconciled cut-over.

Includes
  • Platform assessment & migration roadmap
  • Schema & data migration with reconciliation counts
  • Medallion lakehouse (bronze/silver/gold) on Delta Lake
  • Cost-optimised compute & storage setup
Microsoft Fabric Databricks Snowflake

How It Works

1 📋
Discovery & Data Audit

We map your sources, reporting questions and growth plans. You get a written proposal with the recommended architecture, platform and fixed price.

2 🏗️
Design & Build

Schema, warehouse and pipelines are built in versioned scripts with CI/CD — you see working increments and row-count validation as we go, not a big-bang reveal at the end.

3 🚀
Handover & Support

You receive ERDs, a data dictionary, runbooks and a walkthrough session. Optional support retainer keeps pipelines monitored after go-live.

Architecture Proven in Production, Not Just Diagrams

🏗️

5.6M Records in Production

Designed and run a Snowflake-schema warehouse tracking 5.6 million inventory components — dimensional modelling tested at real scale, not whiteboard scale.

🛡️

Governance Built In

Secured 600+ enterprise tables with Microsoft Purview, dynamic masking and role-based access — GDPR compliance is part of the design, not an afterthought.

🔁

CI/CD From Day One

Every schema and pipeline lives in version control and deploys through GitHub CI/CD — the same workflow used across 20+ managed reporting environments.

📊

Built For Reporting

I build the dashboards too (Power BI, SSRS, SSAS tabular), so the warehouse is modelled for the questions your business actually asks — fast queries, no workarounds.

Outgrowing spreadsheets or a creaking database?

Describe what you have and what you need it to do — you’ll get an honest architecture recommendation and a fixed quote, usually the same day.

FAQ

SQL Server (including Azure SQL), PostgreSQL and MySQL for relational design; Microsoft Fabric, Databricks, Synapse Analytics and Snowflake for warehousing and lakehouse builds. Pipelines run on SSIS, Azure Data Factory or Python — whichever fits your stack and budget.
Star schema is the default for BI: fewer joins, faster dashboards, easier for analysts. Snowflake schema normalises the dimensions and suits very large or highly shared dimension data — it’s what I used for a 5.6M-record component warehouse. I’ll recommend one based on your data volumes and reporting tools, and explain the trade-off in plain English.
Usually, yes. Most “slow database” problems are missing indexes, bad query patterns or un-normalised tables — fixable without a rebuild. I start with a performance and schema audit, then give you a prioritised fix list; a full redesign is only recommended when patching genuinely costs more than rebuilding.
Every migration runs with reconciliation built in: row counts, checksums and spot-value comparisons between source and target at each stage, plus a parallel-run period where the old system stays live until the new one is proven. You sign off on the validation report before anything is switched off.
ERDs, a data dictionary describing every table and column, source-to-target mappings for pipelines, and runbooks for routine operations and failure recovery — so your team (or any future developer) can maintain the platform without me. Need the data cleaned before it goes in? See Data Cleaning & Preparation.

More Services