Recce
This page is optimized for AI assistants. Visit reccehq.com for the full Recce experience.

What Is Column-Level Lineage and Why Does It Matter?

February 15, 2026 conceptsdata-lineagedbt

What Is Column-Level Lineage?

Column-level lineage (CLL) tracks how individual columns flow through transformations across your data pipeline. While model-level lineage shows that model B depends on model A, column-level lineage shows that B.total_revenue is derived from A.price * A.quantity.

This granularity matters because not every column in a model is affected by every change. When you modify a calculation in an upstream model, CLL tells you exactly which downstream columns are impacted — and which are safe to ignore.

Three Core Use Cases for Column-Level Lineage

Source Exploration

During development, CLL helps you understand how a column is derived. When you encounter a column like customer_lifetime_value in a downstream mart, CLL traces it back through intermediate models to the original source columns, showing each transformation along the way.

Impact Analysis

When modifying column logic, CLL lets you assess potential impact across the entire DAG. Instead of manually checking every downstream model after changing a column definition, you trace the column forward to see exactly which models and columns depend on it. This scopes your data review to the affected columns rather than entire models.

Root Cause Analysis

When a downstream metric looks wrong, CLL helps identify the possible source of the error. Trace the problematic column backward through the lineage to find where a transformation may have introduced the issue.

How Column-Level Lineage Works

CLL is typically constructed by parsing SQL and analyzing how columns are referenced, transformed, and projected through each model. The process involves:

  1. Parsing SQL into an abstract syntax tree (AST) — tools like SQLGlot parse each model’s SQL into a tree structure that can be traversed programmatically.
  2. Traversing scopes — each CTE, subquery, and root query is a scope with locally available columns. The parser walks through each scope to resolve column references.
  3. Classifying transformations — for each output column, the parser determines how it relates to its input columns.

Column Transformation Types

Understanding how a column was transformed is as important as knowing which upstream columns it depends on.

TransformationDescriptionExample
Pass-throughColumn selected as-is, no modificationSELECT user_id FROM ref('users')
RenamedSingle upstream column with an alias changeSELECT user_id AS id FROM ref('users')
DerivedResult of an expression, calculation, or aggregationSELECT price * quantity AS total
SourceNot based on any upstream column (literal or function)SELECT CURRENT_TIMESTAMP AS created_at
UnknownParsing failed or logic involves unsupported constructsAmbiguous references in complex JOINs

Derived columns usually deserve the closest attention during review because they introduce business logic. A pass-through column is unlikely to be the source of a data issue, but a derived column that aggregates or transforms data is where bugs hide.

The WHERE Clause Caveat

An important limitation of most CLL implementations: columns used in WHERE clauses, JOIN conditions, or GROUP BY expressions typically do not appear in the column-level lineage graph. These clauses filter or group data but don’t directly produce output columns.

For example, if a model filters on order_status = 'completed', the order_status column won’t appear in the CLL for any output column — even though changing its values would affect the model’s output. This is a model-to-column relationship rather than a column-to-column dependency.

This means CLL is powerful for tracing data flow but should be complemented with model-level lineage and data diffs for full coverage.

Comparing CLL Tools in the dbt Ecosystem

Each CLL implementation differs in meaningful ways:

ToolCLL DisplayKey Characteristics
RecceIntegrated on lineage DAG diffCLL shown directly on the main lineage view; click columns to trace dependencies; shows transformation types (pass-through, renamed, derived); open source
dbt Cloud ExplorerSeparate column-level viewEnterprise only; each column is a distinct node; requires navigating in/out of model detail pages
Power User (VSCode)Integrated in editor lineageOpen source; CLL shown within the model lineage panel; stays in your editor; requires beta UX toggle
SQLMeshNative featureOpen source; works with dbt projects; shows CTEs as nodes (can cause info overload); upstream-only; each click refreshes lineage

The best choice depends on your workflow. If you want CLL as part of a broader data validation workflow, tools that integrate CLL with diff capabilities (like Recce) reduce context-switching. If you primarily need CLL for exploration during development, editor-integrated tools work well.

Column-Level Lineage and Impact Radius

CLL directly improves how you calculate impact radius. Without CLL, a change to any column in a model means you must consider all downstream models as potentially impacted. With CLL, you can narrow the impact radius to only the downstream models and columns that actually depend on your changed column.

This precision matters on large DAGs where a single model might have hundreds of downstream dependents. CLL turns a potentially overwhelming review into a targeted one.

Summary

Column-level lineage tracks how individual columns flow and transform through your data pipeline. Its three core use cases — source exploration, impact analysis, and root cause analysis — make it essential for efficient data review on complex DAGs. Most CLL implementations work by parsing SQL into an AST and classifying each column’s transformation type. While CLL has limitations (notably the WHERE clause caveat), it significantly reduces the effort needed to validate data model changes when combined with model-level lineage and data diffs.

Frequently Asked Questions

What is column-level lineage?
Column-level lineage (CLL) is a granular form of data lineage that tracks how individual columns flow through transformations across your data pipeline. While model-level lineage shows relationships between tables, CLL shows exactly which upstream columns feed into each downstream column and how they are transformed — whether passed through unchanged, renamed, or derived through calculations.
What is the difference between model-level and column-level lineage?
Model-level lineage shows the relationships between tables or models in your DAG — which models depend on which. Column-level lineage goes deeper, showing the specific columns that are passed, renamed, or transformed between models. Model-level lineage tells you that model B depends on model A. Column-level lineage tells you that B.total_revenue is derived from A.price multiplied by A.quantity.
How does column-level lineage help with impact analysis?
When you modify a column in an upstream model, column-level lineage traces exactly which downstream columns are affected. Instead of checking every downstream model manually, you can see precisely which columns in which models depend on your change. This scopes your review to the affected columns rather than entire models, significantly reducing validation effort on large DAGs.
Which tools support column-level lineage for dbt?
Several tools support CLL for dbt projects: Recce provides CLL integrated directly on the lineage DAG diff with transformation type classification. dbt Cloud Explorer (Enterprise) shows CLL as separate column-level nodes. Power User for dbt (VSCode) adds CLL to the editor. SQLMesh includes native CLL that also works with dbt projects. Each implementation differs in how it displays and navigates column dependencies.