What Is the dbt DAG? A Guide to Lineage and Dependencies
What Is a Directed Acyclic Graph?
A directed acyclic graph (DAG) is a structure made up of nodes and edges, where each edge has a direction and no path leads back to its starting node. In plain terms: things depend on other things, and those dependencies never form a loop.
In the context of dbt, each node is a model, source, seed, snapshot, or exposure. Each edge is a dependency created by a ref() or source() call in your SQL. The “directed” part means the relationship has a direction — model B depends on model A, not the other way around. The “acyclic” part means circular dependencies are impossible: if B depends on A, then A cannot also depend on B (directly or through any chain of intermediate models).
How Does dbt Use the DAG?
Every time you write {{ ref('some_model') }} in a dbt model, you create an edge in the DAG. dbt parses all models, resolves all ref() calls, and constructs the complete graph before running anything. This graph determines build order — dbt executes models in topological order so that every model runs only after its upstream dependencies are complete.
This is why dbt can parallelize builds: models that do not depend on each other can run simultaneously, while dependent models wait for their parents to finish.
How Do You Read the dbt DAG?
A typical dbt DAG flows left to right through several layers:
| Layer | Description | Examples |
|---|---|---|
| Sources | Raw data ingested from external systems | source('stripe', 'payments') |
| Staging | Cleaned, renamed, lightly transformed | stg_stripe__payments |
| Intermediate | Business logic, joins, aggregations | int_orders__joined |
| Marts | Final business-facing models | fct_orders, dim_customers |
| Exposures | Downstream consumers (dashboards, ML) | exposure: revenue_dashboard |
Reading the DAG from left to right tells you the story of your data: where it comes from, how it is transformed, and where it ends up. Reading right to left tells you the lineage of any specific model — which upstream models contributed to it.
Why Does DAG Complexity Grow Over Time?
A new dbt project with ten models has a DAG you can understand at a glance. An enterprise project with 500 models has a DAG that looks like a dense web of interconnections. This growth is natural — as a business adds use cases, the DAG accumulates models, cross-references, and shared intermediate logic.
The problem is not complexity itself but the review burden it creates. When you modify a model in a 500-model DAG, understanding which downstream models are affected requires tracing paths through a graph that no human can hold in working memory. This is where tooling becomes essential.
What Is the Difference Between Static and Diff-Aware Lineage?
Static lineage shows the current state of your DAG — all models and their dependencies as they exist right now. This is what you see in dbt docs generate and the dbt Cloud IDE. It answers the question: “What does my project look like?”
Lineage diff compares the DAG between two states — typically your PR branch and production — and highlights what changed. It answers a different and more actionable question: “What did my changes affect?”
| Aspect | Static Lineage (dbt docs) | Lineage Diff |
|---|---|---|
| Shows | Current state of all models | Difference between two states |
| Purpose | Exploration and documentation | Impact analysis and PR review |
| Scope | Entire DAG | Only changed and affected models |
| Use case | Understanding the project | Reviewing a specific change |
Static lineage is valuable for onboarding and documentation. But for day-to-day PR review, it forces you to mentally filter hundreds of unchanged models to find the ones that matter. A lineage diff does that filtering for you.
What Is the Modified+ View?
The modified+ view shows the modified models in your PR plus all their downstream dependents. This represents the potential impact radius of your changes — every model that could be affected by what you changed.
Consider an example: you modify int_orders__joined. The modified+ view shows that model plus the five mart models and two exposures downstream of it. Instead of scanning the entire DAG, you now have a focused list of seven models to validate.
This scoping is the default starting point for data impact analysis. You examine each model in the modified+ set — checking schema diffs, row counts, and profile diffs — to confirm the change behaved as expected and did not introduce unintended side effects.
How Does Lineage Diff Help with PR Review?
When reviewing a dbt pull request, the first question is always: “What is the blast radius?” A lineage diff answers this immediately by showing:
- Which models were directly modified — the ones the author changed
- Which models are downstream — the ones that could be indirectly affected
- Which models were added or removed — structural changes to the DAG itself
This information scopes the review. Instead of reading every SQL file in the diff, the reviewer focuses on the modified models and their downstream dependents. For each model in the impact radius, the reviewer checks whether the data changed as expected using data diffs — schema comparisons, row count checks, and value-level validation.
Without a lineage diff, reviewers either check too little (only the directly modified models, missing downstream breakage) or too much (the entire project, wasting time on unaffected models).
How Does Column-Level Lineage Add Granularity?
Model-level lineage tells you that model B depends on model A. But if you changed only one column in model A, you may not need to review all of model B — only the columns that depend on the one you changed.
Column-level lineage provides this precision. It traces individual columns through transformations, showing exactly which downstream columns are derived from your changed column. On large DAGs, this can reduce the review scope from dozens of models to a handful of specific columns.
Column-level lineage is the next level of granularity beyond the model-level DAG. It does not replace model-level lineage — it refines it.
DAG Complexity and the Case for Tooling
The relationship between DAG size and review effort is not linear — it is combinatorial. A 50-model DAG might have a few dozen dependency paths. A 500-model DAG can have thousands. Manual impact analysis at that scale is slow, error-prone, and inconsistent between team members.
This is why diff-aware lineage tools exist. dbt Cloud provides lineage visualization in its IDE. Recce provides lineage diff with integrated data validation — comparing the DAG between your PR branch and production, highlighting modified models, and letting you run data diffs directly from the lineage view. The goal is the same: reduce the cognitive load of understanding how a change propagates through a complex graph.
Summary
The dbt DAG is a directed acyclic graph that maps every dependency in your project. It determines build order, enables parallelism, and — most importantly — defines how changes propagate. Reading the DAG tells you the story of your data from sources through marts to exposures. As projects grow, static lineage (dbt docs) becomes insufficient for PR review; lineage diffs that compare two states of the DAG are essential for scoping impact analysis. The modified+ view focuses review on the models most likely affected by a change. For even finer granularity, column-level lineage traces individual columns through transformations. The bigger your DAG, the more you need tooling that makes its complexity manageable.
Frequently Asked Questions
- What is the dbt DAG?
- The dbt DAG (directed acyclic graph) is a visual representation of all models in your dbt project and the dependencies between them. Each node represents a model (or source, seed, snapshot), and each edge represents a ref() dependency. The DAG defines the order in which dbt builds models — a model only runs after all its upstream dependencies have completed. The DAG is "acyclic" because circular dependencies are not allowed.
- What is the difference between dbt docs lineage and a lineage diff?
- The dbt docs lineage shows the current state of your DAG — all models and their dependencies as they exist right now. A lineage diff compares the DAG between two states (typically your PR branch and production) and highlights what changed: which models were modified, added, or removed. Lineage diffs are essential for impact analysis because they show not just the current structure, but how your changes affect it.
- How do you use the dbt DAG for impact analysis?
- Start from the modified models and trace downstream through the DAG to identify all dependent models — this is the impact radius. For each model in the impact radius, check whether its schema, row count, or data distributions changed. Focus your review on the "modified+" view — the modified models plus all their downstream dependents — rather than reviewing the entire DAG.
- What is the modified+ view in lineage?
- The modified+ view shows the modified models in your PR plus all their downstream dependents. This represents the potential impact radius of your changes — every model that could be affected. It is the default scoping view for data impact analysis because it focuses review effort on the models most likely to have changed.