On-Prem RDBMS → ADF → ADLS → Synapse (Batch Ingestion + ELT)
- Difficulty: Intermediate
- Tech stack: ADF (Self-Hosted IR), ADLS Gen2, Synapse (COPY INTO/PolyBase), Key Vault (optional)
- Estimated time: 2-3 hrs
Overview
ADF (with Self-Hosted IR) securely connects to your on-prem RDBMS and extracts full or delta sets into ADLS Gen2. Files are stored in a raw zone, then (optionally) standardized to Parquet in a curated zone. Synapse ingests from ADLS using COPY INTO (or PolyBase/external tables) into staging; ELT SQL transforms produce facts/dims with appropriate partitioning/indexing. ADF triggers schedule the flow and log runs; Key Vault stores credentials; simple row-count checks keep the pipeline trustworthy.
Outcome
- Real-world migration path from on-prem DBs to Azure analytics.
- Durable lake-first pattern (ADLS raw → curated) feeding Synapse.
- Repeatable delta loads on a schedule with centralized monitoring.
What you’ll build
- ADF pipelines using Self-Hosted IR to extract from SQL Server/Oracle.
- Lake layout on ADLS Gen2: raw/curated (Parquet/CSV, partitioned).
- Synapse staging + modeled tables; loads via COPY INTO (or PolyBase/external tables if preferred).
- Incremental load pattern (watermark/`last_updated`) with validation (row counts).
- (Optional) Secrets in Key Vault, alerts/monitoring in ADF/Synapse.
- (Optional) Orchestration dependencies (raw → stage → model) with ADF triggers.