Silver Lakehouse → Gold Core Layer

AWS • Lakehouse • Intermediate • Healthcare

Architecture Diagram

Overview

In the previous pipeline, raw batch and CDC data was processed into clean Silver lakehouse tables.

But Silver tables are still closer to the source system. For analytics and reporting, data usually needs to be shaped into more business-friendly tables that teams can easily use.

In this pipeline, you will build the Gold Core layer.

You will take trusted Silver healthcare data and create Gold tables such as dimensions, facts, audit logs, and data quality metrics. These Gold tables become the curated layer that downstream analytics and reporting pipelines can use.

What You Will Build

  • Build business-ready Gold tables from Silver data
  • Create dimension tables for facilities, departments, providers, patients, and dates
  • Create fact tables for encounters, vitals, labs, medications, charges, diagnoses, procedures, allergies, problems, and discharge summaries
  • Store Gold tables as Iceberg tables on Amazon S3
  • Track pipeline runs using audit tables
  • Create data quality metrics for important Silver tables
  • Validate Gold tables using Athena
  • Run the Gold pipeline using MWAA and Airflow

Tech Stack

Amazon S3 • AWS Glue • Apache Iceberg • Amazon Athena • Amazon MWAA Apache Airflow • AWS Glue Catalog • Apache Parquet

Learning Outcomes

After completing this pipeline, you will be able to:

  1. Transform Silver lakehouse tables into Gold tables
  2. Build dimension and fact tables for analytics use cases
  3. Create curated healthcare datasets on Amazon S3
  4. Track data pipeline runs using audit tables
  5. Create basic data quality metrics for important tables
  6. Validate Gold tables using Athena
  7. Orchestrate Gold layer processing using Airflow and MWAA