ADF + Databricks → Medallion Architecture (Bronze/Silver/Gold)

Architecture Diagram

adf-databricks-medallion-architecture

Overview

Files arrive in ADLS raw and are ingested by ADF into Bronze as Delta. Databricks notebooks read Bronze, apply cleaning and conformance to produce Silver (row-level quality, proper types, SCD-ready joins). A second notebook aggregates/enriches into Gold for BI/ML. Tables are partitioned (e.g., by date) and Z-Ordered on common filters; periodic `OPTIMIZE`/`VACUUM` keeps storage and query performance healthy. ADF triggers orchestrate Bronze→Silver→Gold with clear run logs.

What You Will Build

  • ADLS Gen2 layout: raw/bronze, silver, gold (clear folder/table conventions).
  • ADF pipeline to ingest CSV/JSON into Bronze (landing → Bronze Delta).
  • Databricks notebooks to transform Bronze → Silver (dedupe, schema, joins) and Silver → Gold (aggregations/KPIs).
  • Delta performance ops: partitioning, Z-Order, `OPTIMIZE` + `VACUUM` schedule.
  • (Optional) Lightweight DQ checks (row counts, null %, simple constraints).
  • (Optional) ADF triggers for end-to-end orchestration + dependencies.

Tech Stack

Azure Data Factory, ADLS Gen2, Databricks (PySpark), Delta Lake (OPTIMIZE/VACUUM)

Learning Outcomes

  • Lakehouse-ready data on Azure using the Medallion pattern.
  • Reliable Delta tables with ACID, schema enforcement, and time travel.
  • Faster queries via partitioning, Z-Order, and periodic OPTIMIZE/VACUUM.

Recommended Before This

  • Lakehouse-ready data on Azure using the Medallion pattern.
  • Reliable Delta tables with ACID, schema enforcement, and time travel.
  • Faster queries via partitioning, Z-Order, and periodic OPTIMIZE/VACUUM.