Raw Data → Bronze & Silver Lakehouse

Azure • Lakehouse • Intermediate • Payments

Architecture Diagram

Overview

Data arriving in a data lake is rarely ready for analytics.

Files may come from multiple systems, contain duplicates, use different formats, or require standardization before they can be trusted by downstream teams.

In this pipeline, you will build a Medallion Architecture on Azure using Bronze and Silver layers.

You will read raw data from ADLS Gen2, create Bronze tables with lineage and audit information, transform the data into trusted Silver tables, and maintain clean business-ready datasets using Delta Lake.

What You Will Build

  • Build Bronze lakehouse tables on ADLS Gen2
  • Add lineage and audit information during ingestion
  • Store metadata and control totals for pipeline monitoring
  • Standardize raw datasets into business-friendly structures
  • Remove duplicate records using latest-record logic
  • Create trusted Silver tables using Delta Lake
  • Track pipeline runs and data quality checks
  • Query Silver tables using Synapse Serverless SQL
  • Explore Delta Lake features such as Time Travel and Optimize

Tech Stack

Azure Data Lake Storage Gen2 • Azure Databricks • Delta Lake • Apache Spark • Synapse Serverless SQL • Azure Key Vault • Python

Learning Outcomes

After completing this pipeline, you will be able to:

  1. Build Bronze and Silver lakehouse layers on Azure
  2. Add lineage and audit tracking to data pipelines
  3. Standardize raw datasets into trusted business tables
  4. Remove duplicate records using latest-record logic
  5. Create Delta Lake based Silver tables
  6. Track control totals and pipeline execution metadata
  7. Query Delta tables using Synapse Serverless SQL
  8. Understand Delta Lake features such as Time Travel and Optimize