ADF + Databricks → Medallion Architecture (Bronze/Silver/Gold)

Orchestrate ADLS → Databricks ELT into Bronze/Silver/Gold Delta tables – partitioned, Z-Ordered, and scheduled.

Difficulty: Intermediate
Tech stack: Azure Data Factory, ADLS Gen2, Databricks (PySpark), Delta Lake (OPTIMIZE/VACUUM)
Estimated time: 2 hrs

Overview

Files arrive in ADLS raw and are ingested by ADF into Bronze as Delta. Databricks notebooks read Bronze, apply cleaning and conformance to produce Silver (row-level quality, proper types, SCD-ready joins). A second notebook aggregates/enriches into Gold for BI/ML. Tables are partitioned (e.g., by date) and Z-Ordered on common filters; periodic `OPTIMIZE`/`VACUUM` keeps storage and query performance healthy. ADF triggers orchestrate Bronze→Silver→Gold with clear run logs.

Outcome

Lakehouse-ready data on Azure using the Medallion pattern.
Reliable Delta tables with ACID, schema enforcement, and time travel.
Faster queries via partitioning, Z-Order, and periodic OPTIMIZE/VACUUM.

What you’ll build

ADLS Gen2 layout: raw/bronze, silver, gold (clear folder/table conventions).
ADF pipeline to ingest CSV/JSON into Bronze (landing → Bronze Delta).
Databricks notebooks to transform Bronze → Silver (dedupe, schema, joins) and Silver → Gold (aggregations/KPIs).
Delta performance ops: partitioning, Z-Order, `OPTIMIZE` + `VACUUM` schedule.
(Optional) Lightweight DQ checks (row counts, null %, simple constraints).
(Optional) ADF triggers for end-to-end orchestration + dependencies.