Raw Data → Refined BigQuery Tables

GCP • Lakehouse • Intermediate • Retail

Architecture Diagram

Overview

Raw data is useful, but it is rarely ready for analytics.

Customer details may need standardization, product data may need category and brand enrichment, orders may need the latest status, and order items may need to be connected back to products and orders.

In this pipeline, you will build the refined layer on GCP.

You will take raw retail tables from BigQuery, clean and enrich the data using BigQuery stored procedures, create refined business tables, and track watermarks so the pipeline can process only new or changed records.

What You Will Build

  • Create refined BigQuery tables for retail analytics
  • Standardize customer and address data
  • Build product, category, brand, and variant tables
  • Enrich product variants with brand and category details
  • Build order status history and current order views
  • Derive order city, state, pincode, and current status
  • Build order item tables with product enrichment
  • Track pipeline state and watermarks
  • Orchestrate refined transformations using Cloud Composer

Tech Stack

BigQuery • BigQuery Stored Procedures • Cloud Composer Apache Airflow • SQL

Learning Outcomes

After completing this pipeline, you will be able to:

  1. Transform raw BigQuery tables into refined business tables
  2. Standardize customer and address data
  3. Enrich products with brand and category information
  4. Build current-state order tables from order and status data
  5. Create join-ready tables for downstream analytics
  6. Track watermarks and pipeline state in BigQuery
  7. Orchestrate SQL transformations using Cloud Composer