On-Prem RDBMS → ADF → ADLS → Synapse (Batch Ingestion + ELT)

Pull from on-prem SQL/Oracle with ADF (SHIR), land to ADLS, and load/transform in Synapse with COPY INTO + ELT.

Difficulty: Intermediate
Tech stack: ADF (Self-Hosted IR), ADLS Gen2, Synapse (COPY INTO/PolyBase), Key Vault (optional)
Estimated time: 2-3 hrs

Overview

ADF (with Self-Hosted IR) securely connects to your on-prem RDBMS and extracts full or delta sets into ADLS Gen2. Files are stored in a raw zone, then (optionally) standardized to Parquet in a curated zone. Synapse ingests from ADLS using COPY INTO (or PolyBase/external tables) into staging; ELT SQL transforms produce facts/dims with appropriate partitioning/indexing. ADF triggers schedule the flow and log runs; Key Vault stores credentials; simple row-count checks keep the pipeline trustworthy.

Outcome

Real-world migration path from on-prem DBs to Azure analytics.
Durable lake-first pattern (ADLS raw → curated) feeding Synapse.
Repeatable delta loads on a schedule with centralized monitoring.

What you’ll build

ADF pipelines using Self-Hosted IR to extract from SQL Server/Oracle.
Lake layout on ADLS Gen2: raw/curated (Parquet/CSV, partitioned).
Synapse staging + modeled tables; loads via COPY INTO (or PolyBase/external tables if preferred).
Incremental load pattern (watermark/`last_updated`) with validation (row counts).
(Optional) Secrets in Key Vault, alerts/monitoring in ADF/Synapse.
(Optional) Orchestration dependencies (raw → stage → model) with ADF triggers.