Azure DevOps CI/CD for Data Pipelines
- Difficulty: Intermediate
- Tech stack: Azure DevOps (Repos/Pipelines), ADF (Git + ARM/Bicep), Databricks (CLI/dbx/Jobs), Key Vault
- Estimated time: 1-2 hrs
Overview
You’ll connect ADF to Git (collaboration branch) so each publish generates ARM templates in `adf_publish`. A build pipeline packages those templates and the Databricks assets (notebooks, job configs). A multi-stage YAML then deploys: Stage Dev runs ARM/Bicep to update ADF and uses the Databricks CLI/dbx to import notebooks and create/update Jobs; the same artifact promotes to QA and Prod with environment-scoped variables and approvals. Secrets (connection strings, tokens) are referenced from Azure Key Vault so no secrets live in the repo.
Outcome
- Git-driven workflows for ADF/Databricks with branch policies and PR reviews.
- Repeatable releases to Dev/QA/Prod via pipelines, approvals, and variables.
- Secure secrets through Key Vault-backed service connections.
What you’ll build
- Azure Repos (mono-repo or split): ADF JSON + Databricks notebooks (repo structure & naming).
- Build pipeline (YAML) to validate ADF artifacts, lint notebooks, and publish artifacts (ADF adf_publish ARM/Bicep, notebook bundle).
- Release pipeline / multi-stage YAML to:
-
- Deploy ADF `az deployment` to each environment.
- Deploy Databricks notebooks (Databricks CLI/REST or `dbx`) and update Jobs.
- Environments with manual approvals for Prod, variable groups per env.
- Key Vault integration for secrets (ADF linked services, Databricks tokens, JDBC creds).
- (Optional) Policy & quality gates: branch protections, build validations, unit tests for SQL/py code.