Nov 18, 2025 - 8 MIN READ

dbt on Databricks: Data Transformation Pipelines

How to connect dbt Cloud to Databricks Unity Catalog — step by step: SQL Warehouse, Unity Catalog, access tokens, project initialisation, and repository setup.

Radek Řezáč

dbt (data build tool) and Databricks are two of the most important tools in the modern data engineering stack. dbt handles transformation — turning raw data into structured, tested, documented models using SQL and Python. Databricks provides the compute and storage layer via Spark, Delta Lake, and Unity Catalog. This article walks through connecting them.

dbt Labs

dbt Labs produces both:

dbt Core — the open-source CLI for authoring models, macros, tests, and sources
dbt Cloud — a hosted platform with a UI, scheduling, and collaboration features

Key offerings:

SQL-first transformations — write SELECT statements, dbt handles the CREATE TABLE AS or INSERT DDL
Testing and documentation — built-in schema tests (not_null, unique, accepted_values, relationships) plus YAML-defined documentation
Lineage — automatic DAG visualisation of model dependencies

Databricks

Databricks is a unified data analytics platform built around Apache Spark. Core features relevant to dbt integration:

Lakehouse Architecture — combines data lakes and data warehouses in a single platform using Delta tables
Unity Catalog — centralised governance layer for tables, volumes, schemas, and grants
SQL Warehouses — serverless or classic compute endpoints for running SQL queries (used by dbt as the execution target)
Collaborative Notebooks — support for Python, SQL, Scala, and R

Connecting dbt to Databricks: Step by Step

Step 1: Create a SQL Warehouse

In Databricks, create a SQL Warehouse that dbt will use to execute queries.

SQL Warehouse creation

Note the Server Hostname and HTTP Path — you will need these for the dbt connection.

Step 2: Create a Unity Catalog

Create a new Unity Catalog and associate it with the SQL Warehouse from Step 1.

Create Unity Catalog

Step 3: Create a dbt Connection

In dbt Cloud, go to Account Settings → Connections → New Connection:

Select Databricks
Enter the Server Hostname and HTTP Path from Step 1
Optionally set the Unity Catalog name as the default catalog

dbt connection

dbt connection settings

Step 4: Generate a Databricks Access Token

In Databricks, go to Settings → Developer → Access Tokens → Generate New Token. Copy the token — you will need it in the next step.

Databricks token

Step 5: Create and Initialise the dbt Project

In dbt Cloud, go to Account Settings → Projects → New Project:

Enter the project name
Select the connection created in Step 3
Select Token as the authentication method and paste the Databricks token
Leave schema as default
Set up a repository (create a new one or connect an existing one)
Go to the Studio tab
Initialise the project and commit to a new branch

dbt new project

dbt create repo

What You Get

Once connected, you can author dbt models as SQL SELECT statements and dbt will materialise them as Delta tables in Unity Catalog. The full power of dbt — incremental models, snapshots, macros, tests, and sources — is available against Databricks compute, with lineage visible both in dbt Cloud and in Unity Catalog's governance layer.

This integration is particularly powerful for teams already running Databricks pipelines who want to bring engineering discipline (version control, testing, documentation) to their SQL transformation layer without switching platforms.

Data Catalog 3.0: Rise of the Active Metadata Platform

How the role of the data catalog has evolved from passive inventory to active metadata platform, and where it sits in the modern data stack and data mesh architecture.

Deploying Azure Resources with VS Code

Utilizing a Bicep script template for resource deployment via VS Code, the Azure CLI, and Azure DevOps CD pipelines.