Relational Database → ADLS Gen2 (Batch Ingestion)

Azure • Foundations • Beginner • Payments

Architecture Diagram

Overview

Payment applications store important day-to-day data such as customers, merchants, KYC records, wallets, ledger entries, settlements, refunds, disputes, and device activity inside databases.

But this data cannot always be used directly for reporting or analytics from the same application database. In real projects, data is usually moved from the source system into a data lake first, so other teams and pipelines can safely use it.

In this pipeline, you will build that first step on Azure.

You will take fintech data from MySQL, load it into Azure Data Lake Storage, track each run using metadata tables, and load only new or changed records instead of reloading everything every time.

What You Will Build

  • Set up a MySQL source database with fintech data
  • Prepare Azure SQL metadata tables to track ingestion runs
  • Configure source entities that need to be loaded
  • Load database tables into ADLS Gen2
  • Load only new or changed records using watermarks
  • Store raw data in organized ADLS folders
  • Track row counts, watermarks, and run status
  • Simulate daily incremental changes in the source database
  • Validate ingestion runs using metadata queries

Tech Stack

MySQL • Azure Data Factory • Azure Data Lake Storage Gen2 • Azure SQL Database • Azure Key Vault • Azure IAM / RBAC • SQL

Learning Outcomes

After completing this pipeline, you will be able to:

  1. Extract data from relational databases into Azure
  2. Build repeatable batch ingestion pipelines
  3. Load only new or changed records during pipeline execution
  4. Use metadata tables to track ingestion runs
  5. Maintain watermarks for incremental loading
  6. Store source data in a cloud data lake
  7. Validate ingestion runs using control and metadata queries