Book a demo

Building Impact Radius #2: The Technical Breakthrough

Continuing from our last blog post about building Impact Radius, we realized we could build an even more powerful solution to answer “what do I actually need to validate to ensure my data is right” using our breaking change analysis and column-level lineage (CLL) features.

Version three: column-level precision

Breaking change analysis and column-level lineage aren’t new concepts. Many data tools offer one or another of these features. However, no one has combined these two features together.

While building breaking change analysis and CLL, we couldn’t help but think, “if we can do breaking change analysis at the column level, could we narrow the impact radius to the column level too?” This changes the workflow from “this model has a breaking change, now validate everything downstream” to “this column changed, now validate only what uses it.”

We call this a partial breaking change: changes that may impact downstream models, depending on whether they use the modified columns.

Here’s why: an example using the previous orders model

SELECT
	 customer_id,
	 order_date,
-- total_amount,
++ total_amount * 0.85 AS total_amount  # Convert USD to EUR
FROM raw_orders

The example modifies only the total_amount column. But we can’t just look at this change in isolation, we need to understand how downstream models use this column.

  1. If a downstream model doesn’t use total_amount, it has no impact.
  2. If a downstream model uses total_amount, we need to know how it uses it to determine understand impact: either the whole model or a just few columns.

Different downstream scenarios of using modified columns:

-- Downstream A: No impact
SELECT
	customers_id,
	COUNT(order_date) AS date_counts
FROM orders
GROUP BY customer_id DESC

-- Downstream B: Column impact
SELECT
	customers_id,
	SUM(total_amount) AS customers_value
FROM orders
GROUP BY customer_id DESC

-- Downstream C: Model impact
SELECT
	customers_id,
	order_date,
	total_amount
FROM orders
WHERE total_amount >= 10000

In these downstream models , we know

  • Downstream A doesn’t use total_amount, so we can skip validating this model.
  • Downstream B uses total_amount in calculations, so we should validate this column, customers_value.
  • Downstream C uses total_amount in WHERE clause, so we should validate the whole model.

The key insight: dependency relationships determine Impact Radius. We need to understand not just what changed, but how downstream models depend on those changes.

When we realized this, we stepped back and systematically mapped out exactly what we’d need to build.

Building blocks for Impact Radius v3

Building Impact Radius v3 required understanding types of dependencies between models and columns. Rather than dive into the technical complexity here, we’ll show you the blocks approach we took.

Here’s how we broke down the problem: Overview of building Impact Radius

Our systematic approach:

  • Dependency Analysis (blue boxes): We list four types of dependency analysis that we needed to understand before building features.
  • Recce Features (yellow boxes): The lineage and lineage diff as our foundation to show Impact Radius.
  • Change Analysis (green boxes): The breaking change analysis that leads us start this journey.
  • Impact Radius (orange boxes): Where users can validate exactly what matters, nothing more, nothing less.

All of these components work together to finally answer: “What do I actually need to validate to ensure my data is right?”

Let’s walk through how we built each piece, starting with the realization that changed everything: not all changes are created equal.

Breaking change analysis

The first breakthrough came when we realized we needed to understand not just what changed, but how it changed. This led us to categorize breaking changes into three types that would drive everything else we built.

  1. Non-breaking change: No impact to downstream since no change in existing columns, e.g. adding a new column.
  2. Partial breaking change: Impact to some, but not all downstream columns since only some columns change.
  3. Breaking change: Impact to the whole downstream model since the whole model changes, e.g. modified WHERE, JOIN, GROUP BY, or ORDER BY.

But when we shipped this as a toggle on the lineage, something felt off. Users expected “Breaking Change Analysis” to show them what was broken, meaning impacted downstream models. Instead, they saw models with code changes highlighted. Lineage view when switch to Breaking Change Analysis

This confusion made sense once we understood the terminology gap. We borrowed this idea from software, but data people understand it differently.

  • In software development: Breaking changes are intentional improvements. You plan them, document them, and manage the transition.
  • In analytics engineering: Breaking changes are unplanned problems. Your dashboard breaks, your pipeline fails, and you need to fix it immediately.

This mismatch between expectation and reality made us realize we were missing a crucial piece: understanding exactly how upstream changes affect downstream models.

Dependency analysis

This confusion led us to our next breakthrough: we needed to map the relationships between models and columns at a much more granular level.

It was straight forward to map out four dependency types.

  1. Model-to-Model
  2. Model-to-Column
  3. Column-to-Column
  4. Column-to-Model

After testing it ourselves, we realized something unintuitive: tracking Column-to-Model dependency is pointless. Any column can get hit by a model-level breaking change upstream, so really, a column’s Column-to-Model link is just the same as the Model-to-Model link for its parent model. Why make life harder?

When viewing the lineage tab on Recce, we used to describe the relations from upstream to downstream, for dependency analysis we needed to describe the relation from downstream to upstream. It’s not easy to discuss these relationships by name only. Internally, we confused ourselves when we used Model-to-Model, Column-to-model to communicate.

This naming confusion taught us something important: there are two directions we view in lineage.

  1. Lineage direction: upstream flows to downstream (left to right)
  2. Dependency direction: downstream depends on upstream (right to left)

When we say Model-to-Model dependency, we mean the downstream model depends on the upstream model, right to left. When we say Column-to-Model dependency, we mean the downstream column depends on the upstream model, right to left.

1. Model-to-Model dependency

This is the dependency most data tools understand. When you reference a model, everything downstream depends on it. If upstream_model changes, downstream_model needs validation.

--- Downstream #1 downstream model depeneds on upstream model
SELECT a, b
FROM {{ ref("upstream_model") }}

Model-to-Model dependency

2. Column-to-Column dependency

This is where it gets interesting. Not every column in a downstream model depends on every column in the upstream model. We can track dependencies at the column level.

Column-to-Column dependency means the downstream column is a direct projection (or transformation) of the upstream column. The downstream column’s value is derived from the upstream column through SELECT operations.

We’ve clarified there are five transformation types to understand how each column is generated or modified.

--- Downstream #2 downstream column depends on upstream column
SELECT
  a,
  b AS b2
FROM {{ ref("upstream_model") }}

Column-to-Column dependency

Here, downstream_model.a depends on upstream_model.a, and downstream_model.b2 depends on upstream_model.b. If upstream_model.b changes, only downstream_model.b2 is affected, not downstream_model.a.

This is the key insight that enables partial breaking changes, some but not all columns are changed. Looking at the column level, we can track impact at each column instead of assuming the entire model is affected.

3. Model-to-Column dependency

Sometimes a single upstream column affects an entire downstream model, even when that column doesn’t appear in the SELECT clause.

Model-to-Column dependency occurs when an upstream column is used in WHERE, JOIN, GROUP BY, or ORDER BY clauses. Changes to this column affect which rows are included in the downstream model, impacting the entire model’s output.

--- Downstream #3 downstream model depeneds on an upstream column
SELECT
  a
FROM {{ ref("upstream_model") }}
WHERE b > 0

Model-to-Column dependency

In this case, downstream_model depends on upstream_model.b even though column b doesn’t appear in the final output. If the values in column b change, the entire downstream model needs validation because different rows might be included or excluded.

Applying this to Recce’s demo PR, you can see customers model depends on stg_orders.status due to using it in the WHERE clause. Demo PR

With these three dependency types mapped out, we finally had the missing pieces.

Impact Radius v3

Armed with both column level breaking change analysis and column level dependency mapping, we could finally answer the original question with surgical precision: “What do I actually need to validate to ensure my data is right?”

Impact Radius shows which downstream models are affected by your changes and helps you analyze impacts at the column level. The scope depends on which modified model you’re analyzing.

Each model has an impact type:

  1. Not impacted: no impact to the whole model.
  2. Partially impacted: some columns in the model are impacted.
  3. Fully impacted: the whole model is impacted.

For example, in this demo, customer_order_pattern has different impact types:

  1. Impact Radius from customers: impact type of customer_order_pattern= fully impacted Impact Radius from customers
  2. Impact Radius from stg_payments: impact type of customer_order_pattern = not impacted Impact Radius from stg_payments
  3. Impact Radius from all changed models: impact type of customer_order_pattern = fully impacted Impact Radius form all changed models

Now that we understand how impact types work and how the same model can have different impacts depending on the source, let’s see this in action.

Use Impact Radius

Let’s walk through real scenarios and answer “what do I actually need to validate to ensure my data is right?”

Example 1: When dependencies create full Impact from partial changes

A simple 2-model pipeline where a column modification in the upstream model affects the entire downstream model.

What Changed:

  • upstream_model: Modified column c (partial breaking change)
    SELECT
       a,
    	 b,
    -- c
    ++ coalesce(c, 0) as c
    FROM {{ ref("raw_model") }}
  • downstream_model: No changes to the model itself
       SELECT
          a,
          COUNT(c) as c2
       FROM {{ ref("upstream_model") }}
       WHERE c > 2
       GROUP BY a

Impact Radius from all changed model (which equals to Impact Radius from upstream_model here)

  • The upstream_model.c is in the impact radius since it’s a partial breaking change.
  • The whole downstream_model is in the impact radius since it’s fully impacted. This is a Model-to-Column dependency: the modified column c is used in the WHERE clause, affecting which rows are included and potentially changing all output data.

When dependencies create full Impact from partial changes

Answer to “What do I actually need to validate?”

Validate the upstream_model.c column and entire downstream_model.

Example 2: When dependencies create no Impact from changes

Another simple 2-model pipeline where a column modification in the upstream model doesn’t affect the downstream model.

What Changed:

  • upstream_model: Modified column c (partial breaking change)
SELECT
   a,
	 b,
-- c
++ coalesce(c, 0) as c
FROM {{ ref("upstream_model") }}
  • downstream_model: No changes to the model itself
	SELECT
      a,
      b
   FROM {{ ref("upstream_model") }}
   WHERE b > 0

Impact Radius from all changed models (which equals to Impact Radius from upstream_model here)

  • The upstream_model.c is in the impact radius since it’s a partial breaking change.
  • The whole downstream_model is not in the impact radius since it’s not impacted. No dependency exists between the modified column c and downstream_model : the downstream model doesn’t use column c.

When dependencies create no Impact from changes

Answer to “What do I actually need to validate?”

Validate only the upstream_model.c column.

Example 3: Dependencies connect multiple change types

A 3-model pipeline showing how dependencies connect different change types, and how a downstream model’s impact type depends on all upstream changes.

What Changed:

  • upstream_model_1: Add a WHERE clause (breaking change)
  SELECT
    a,
    b
  FROM {{ ref("raw_model") }}
++ WHERE b > -1
  • upstream_model_2: Rename a column (partial breaking change)
SELECT
	 a,
-- b
++ b as b2
FROM  {{ ref("upstream_model_1") }}
  • downstream_model: No changes to the model itself
SELECT
  a
FROM {{ ref("upstream_model_2") }}
WHERE b2 > 0

Impact Radius from all changed models,

  • upstream_model_1 is in the impact radius as the source of breaking change
  • upstream_model_2 is in the impact radius since it’s fully impacted due to Model-to-Model dependency from upstream_model_1 (breaking change affects entire downstream model)
  • downstream_model is in the impact radius since it’s fully impacted due to Model-to-Column dependency. It uses column b2 in the WHERE clause, and b2 carries the impact from the upstream breaking change through the dependency chain

Dependencies connect multiple change types

Answer to “What do I actually need to validate?”

Validate the whole upstream_model_1, upstream_model_2 and downstream_model_1 . Especially check the upstream_model_2.b2 since it’s a partial breaking change.

Now that we understand these relationships that make Impact Radius v3 possible, the next challenge was the UX. How can we show these complex concepts, the modified, impacted, dependency and transformation types, on one lineage view?

UX challenges

Building the technical foundation was only half the battle. When applying these ideas to real data, you quickly get a user interface overwhelmed with information. We faced three major UX challenges that threatened to make Impact Radius too complex for users to adopt.

Challenge 1: Visual complexity

We wanted to show Impact Radius clearly by displaying models, columns, and the lines between them. But when you try to show everything at once, it becomes overwhelming.

Impact Radius form all changed models In this lineage view, you can see:

  1. stg_payments and customer_segments are non-breaking changes; customers is a breaking change.
  2. Lines between customers to customer_segments and customer_order_pattern show breaking changes impact both downstream models fully.
  3. Lines between customers.customer_lifetime_value to customer_segments.customer_lifetime_value and customer_segments.value_segments show column-level dependencies.

The lineage becomes too complicated to notice such details at a glance.

Challenge 2: It’s a mixed of modified and impacted

In the above lineage, you see what are modified with breaking change analysis and what are impacted. There are too many lines to view the dependencies, so you need to think about how many nodes you should validate.

Impact Radius from all changed models,

  • Only the stg_payments.coupon_amount is in the impact radius since it’s a new column.
  • The customers ,customer_segments and customer_order_pattern are in the impact radius since customers is a breaking change and affects these two downstream models.
  • The finance_revenue is in the impact radius since it’s a new model.

Answer to “What do I actually need to validate?”

  1. four models: customers, finance_revenue, customer_segments, customer_order_patten. Especially these six columns: customers.customer_lifetime_value, customer_segments.customer_lifetime_value , customer_segments.value_segment, customers.net_customer_lifetime_value, customer_segments.net_customer_lifetime_value, customer_segments.net_value_segment
  2. one column: stg_payments.coupon_amount

It certainly reduces what needs to be validated: It was six models for all modified and downstream models. However, you need to think through the result of breaking change analysis and dependency analysis with the whole DAG.

We know it’s not showing ONLY impacted models and columns, but we think now it’s a balance to provide information for users. We’d like to update when we have more feedback.

Challenge 3: Navigation between views

We realized you needed to drill down from the overview to specific relationships. When you click customers.customer_lifetime_value, the upstream and downstream relations become clear: Column-Level Lineage for customers.customer_lifetime_value

But how do you easily switch between these views? It’s natural to see the Impact Radius of all changed models, click one model or column to zoom in, then zoom out to see all changed models again. The navigation flow was a major challenge.

Our solution: Impact Radius and navigation

After several internal discussions, we agreed on creating a new name “Impact Radius” to represent the areas that show what is impacted, with clear navigation patterns:

  1. Default view: the state:modified+
    The default view shows modified models and their downstream models, like Impact Radius v1. The default view of lineage
  2. Impact Radius button: full precision
    Click “Impact Radius” to see Impact Radius for all changed models. This is Impact Radius v3 with precise column-level impact. Impact Radius from All Changed Models
  3. Right-click for model-specific impact
    Right-click a modified model and choose “Show Impact Radius” to see that model’s specific impact. Impact Radius from customers model
  4. Use column-level linage to view column-specific impact
    In the column-level lineage, you can view the downstream impact of this column. Though you also view the upstream. Column-level linage for customers.customer_lifetime_value
  5. Direct navigation
    Since the lineage can only represent one view at a time, we made direct navigation intuitive:
    • Click a model → see model lineage
    • Click a column → see column lineage Column Lineage for customers.customers_lifetime_value Model lineage of stg_orders
  6. Return navigation
    Click “X” to return to the default view. We’re planning additional solutions to enable returning to previous views (e.g., from Impact Radius of customers back to Impact Radius of all changed models).

There are assumptions behind this navigation design. We’re waiting for user feedback after Impact Radius v3 adoption, and we’ll adjust the UX accordingly. The key principle: progressive disclosure, start simple, drill down when needed, always provide a clear path back.

Impact Radius is live

This is the whole journey of building Impact Radius from v1 to v3: the passion, struggle, confusion, and finally arriving at a satisfying solution to present.

Try it now: Upgrade to Recce v1.10.0 with pip install recce -U, explore our interactive demos to experience Impact Radius, or hop on a call to learn more.

It’s just the beginning. We want user feedback and judgments. (Give us feedback [email protected]) This feature took us months to build, and we know we can make it better to help data teams answer: “What do I actually need to validate to ensure my data is right?”

What’s next

Remember the original problem we set out to solve? Users felt the “wow” moment seeing our default state:modified+ lineage, but then worried about the complexity and cost of validating models.

Impact Radius v3 directly addresses these concerns by showing precisely what needs validation. But, how do you actually use this in your daily workflow?

In our next article, we’ll share examples of how you can take advantage of Impact Radius to solve your data validation problems. We’ll explore the real validation workflows Impact Radius enables and when column-level precision increases confidence in deployments while reducing costly over-validation.

Impact Radius represents our biggest bet on making data validation less costly through precision. The technical foundation is solid, the UX principles are clear, and now we’re excited to see how data teams use it to transform their validation practices.

Stay tuned and enter email below to subscribe our newsletter.
Join our journey to ship correct data with confidence.

Get Recce updates in your inbox

Interested in data best practices and Recce usage tips?