Reusable Azure DevOps Pipeline Templates for Databricks Asset Bundles
The Problem: Repeating CI/CD Setup Across Projects
If you’ve worked on multiple Databricks projects, you’ve probably experienced the tedium of setting up CI/CD pipelines for each one. Every project needs:
- Python environment setup
- Databricks CLI installation and configuration
- Bundle validation
- Deployment to multiple environments
- Job execution
- Unit test integration
Copy-pasting pipeline YAML between projects leads to drift, inconsistencies, and maintenance headaches. When you need to update authentication logic or add a new step, you’re touching dozens of files across repositories.
The Solution: Centralized Pipeline Templates
I created devops-databricks-asset-bundles—a repository of reusable Azure DevOps pipeline templates that standardize Databricks deployments across all projects.
Architecture
devops-databricks-asset-bundles/
├── databricks-bundle-pipeline-template.yml # Main template
└── steps/
├── checkout-self.yml # Repository checkout
├── setup-python.yml # Python + pytest
├── install-databricks-cli.yml # CLI installation
├── configure-databricks-cli.yml # Azure auth + CLI config
├── run-unit-tests.yml # Dynamic test discovery
├── validate-databricks-bundle.yml # Bundle validation
├── deploy-databricks-bundle.yml # Deployment with auto-approve
└── run-databricks-jobs.yml # Job execution
Each step template handles one specific concern. The main template orchestrates them into a complete CI/CD pipeline with validation, staging, and production stages.
How It Works
1. Import Once, Use Everywhere
Import the template repository into your Azure DevOps organization once. Then reference it as a resource in any project:
resources:
repositories:
- repository: templates
type: git
name: YourOrg/devops-databricks-asset-bundles
extends:
template: databricks-bundle-pipeline-template.yml@templates
parameters:
projectName: 'MyDataPipeline'
workingDirectory: 'databricks'
azureSubscription: 'Azure-Prod-Connection'
jobNames: ['DailyETL', 'WeeklyAggregation']
devVariableGroup: 'databricks-dev'
stagingVariableGroup: 'databricks-staging'
prodVariableGroup: 'databricks-prod'
That’s it. Your project now has a complete CI/CD pipeline with:
- Validation on feature branches
- Deployment to staging on
devbranch - Deployment to production on
mainbranch - Automatic job execution post-deployment
2. Azure AD Authentication
One of the trickier parts of Databricks CI/CD is authentication. The template handles this using Azure service connections:
# From configure-databricks-cli.yml
- task: AzureCLI@2
inputs:
azureSubscription: $
scriptType: "bash"
inlineScript: |
DATABRICKS_TOKEN=$(az account get-access-token \
--resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d \
--query "accessToken" -o tsv)
echo "##vso[task.setvariable variable=DATABRICKS_TOKEN]$DATABRICKS_TOKEN"
The magic number 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d is the Azure Databricks resource ID. This approach uses your existing Azure service connection to generate short-lived Databricks tokens—no need to manage separate Databricks PATs.
3. Automatic Unit Test Discovery
Tests are discovered and executed automatically if they exist:
# From run-unit-tests.yml
if [ -d "tests" ] && [ "$(find tests -name 'test_*.py' -o -name '*_test.py' | wc -l)" -gt 0 ]; then
python -m pytest tests/ -v --junitxml=test-results.xml --cov=. --cov-report=xml
fi
No tests? The pipeline continues. Have tests? They run automatically with coverage reports published to Azure DevOps. Failed tests block deployment.
4. Smart Stage Conditions
The template is designed to avoid redundant work:
stages:
- stage: Validate
# Only validate on feature branches (not main/dev)
condition: |
and(
ne(variables['Build.SourceBranch'], 'refs/heads/main'),
ne(variables['Build.SourceBranch'], 'refs/heads/dev')
)
- stage: toStaging
# Deploy to staging only from dev branch
condition: eq(variables['Build.SourceBranch'], 'refs/heads/dev')
dependsOn: [] # Run in parallel, don't wait for Validate
Why skip validation on main and dev? Because those branches run the full deployment which includes validation. Running validation separately would mean setting up Python and the CLI twice.
Key Features
Auto-Approve for Automated Pipelines
Databricks bundle deployments can prompt for confirmation when making destructive changes. For automated CI/CD, you often want to bypass this:
extends:
template: databricks-bundle-pipeline-template.yml@templates
parameters:
autoApprove: true # Adds --auto-approve flag
# ...
Use with caution in production—this will automatically approve resource deletions.
Wait (or Don’t) for Jobs
Some Databricks jobs run for hours. Waiting for them would timeout your pipeline:
extends:
template: databricks-bundle-pipeline-template.yml@templates
parameters:
jobNames: ['LongRunningETL']
waitForJobs: false # Trigger and continue
# ...
When waitForJobs: false, the job is triggered with --no-wait and the pipeline succeeds immediately. Check the Databricks UI for actual job status.
Environment Variables via Variable Groups
Each environment (dev, staging, prod) uses its own Azure DevOps variable group:
parameters:
devVariableGroup: 'databricks-dev' # Contains DATABRICKS_HOST for dev
stagingVariableGroup: 'databricks-staging'
prodVariableGroup: 'databricks-prod'
Variable groups typically contain:
DATABRICKS_HOST: Workspace URLenv: Target name matching yourdatabricks.ymltargets
Example Project Structure
Here’s how a Databricks project looks when using these templates:
my-databricks-project/
├── azure-pipelines.yml # References the template
├── databricks/
│ ├── databricks.yml # Databricks bundle config
│ ├── src/
│ │ ├── main.py
│ │ └── transformations.py
│ └── resources/
│ └── jobs.yml
└── tests/
├── __init__.py
├── test_transformations.py
└── test_main.py
The azure-pipelines.yml is minimal:
trigger:
branches:
include: [main, dev]
pr:
branches:
include: [main]
resources:
repositories:
- repository: templates
type: git
name: MyOrg/devops-databricks-asset-bundles
extends:
template: databricks-bundle-pipeline-template.yml@templates
parameters:
projectName: 'my-databricks-project'
workingDirectory: 'databricks'
azureSubscription: 'Azure-Production'
jobNames: ['daily_etl']
devVariableGroup: 'databricks-dev'
stagingVariableGroup: 'databricks-staging'
prodVariableGroup: 'databricks-prod'
Why Not GitHub Actions?
This template is Azure DevOps-specific because:
- Enterprise adoption: Most organizations using Databricks on Azure are already invested in Azure DevOps
- Service connections: Azure DevOps service connections integrate seamlessly with Azure AD authentication
- Variable groups: Centralized secret management across environments
- Template extends: Azure DevOps
extendskeyword provides clean template inheritance
A GitHub Actions version would require different authentication patterns (OIDC, service principals) and secret management. That’s a potential future project.
Getting Started
- Import the repository into your Azure DevOps organization:
- Go to Repos → Import
- URL:
https://github.com/brianjmurray/devops-databricks-asset-bundles.git
-
Set up variable groups for each environment with
DATABRICKS_HOSTandenv -
Create an Azure service connection with access to your Databricks workspaces
- Reference the template in your project’s
azure-pipelines.yml
Future Improvements
Ideas for expanding the templates:
- Artifact publishing: Wheel/egg builds for library projects
- Integration tests: Post-deployment validation against real data
- Rollback support: Automatic rollback on job failures
- Notifications: Slack/Teams integration for deployment status
- Multi-workspace: Deploy same bundle to multiple workspaces
Conclusion
Centralizing CI/CD templates eliminates the “copy-paste pipeline” anti-pattern. Update authentication logic once, and all projects benefit. Add a new step (like security scanning), and it rolls out everywhere.
The devops-databricks-asset-bundles repository is open source. Feel free to fork it, adapt it to your needs, or contribute improvements back.
Repository: github.com/brianjmurray/devops-databricks-asset-bundles