Building Impact Radius #2: The Technical Breakthrough
Continuing from our last blog post about building Impact Radius, we realized we could build an even more powerful solution to answer “what do I actually need to validate to ensure my data is right” using our breaking change analysis and column-level lineage (CLL) features.
Version three: column-level precision
Breaking change analysis and column-level lineage aren’t new concepts. Many data tools offer one or another of these features. However, no one has combined these two features together.
While building breaking change analysis and CLL, we couldn’t help but think, “if we can do breaking change analysis at the column level, could we narrow the impact radius to the column level too?” This changes the workflow from “this model has a breaking change, now validate everything downstream” to “this column changed, now validate only what uses it.”
We call this a partial breaking change: changes that may impact downstream models, depending on whether they use the modified columns.
Here’s why: an example using the previous orders
model
SELECT
customer_id,
order_date,
-- total_amount,
++ total_amount * 0.85 AS total_amount # Convert USD to EUR
FROM raw_orders
The example modifies only the total_amount
column. But we can’t just look at this change in isolation, we need to understand how downstream models use this column.
- If a downstream model doesn’t use
total_amount
, it has no impact. - If a downstream model uses
total_amount
, we need to know how it uses it to determine understand impact: either the whole model or a just few columns.
Different downstream scenarios of using modified columns:
-- Downstream A: No impact
SELECT
customers_id,
COUNT(order_date) AS date_counts
FROM orders
GROUP BY customer_id DESC
-- Downstream B: Column impact
SELECT
customers_id,
SUM(total_amount) AS customers_value
FROM orders
GROUP BY customer_id DESC
-- Downstream C: Model impact
SELECT
customers_id,
order_date,
total_amount
FROM orders
WHERE total_amount >= 10000
In these downstream models , we know
- Downstream A doesn’t use
total_amount
, so we can skip validating this model. - Downstream B uses
total_amount
in calculations, so we should validate this column,customers_value
. - Downstream C uses
total_amount
inWHERE
clause, so we should validate the whole model.
The key insight: dependency relationships determine Impact Radius. We need to understand not just what changed, but how downstream models depend on those changes.
When we realized this, we stepped back and systematically mapped out exactly what we’d need to build.
Building blocks for Impact Radius v3
Building Impact Radius v3 required understanding types of dependencies between models and columns. Rather than dive into the technical complexity here, we’ll show you the blocks approach we took.
Here’s how we broke down the problem:
Our systematic approach:
- Dependency Analysis (blue boxes): We list four types of dependency analysis that we needed to understand before building features.
- Recce Features (yellow boxes): The lineage and lineage diff as our foundation to show Impact Radius.
- Change Analysis (green boxes): The breaking change analysis that leads us start this journey.
- Impact Radius (orange boxes): Where users can validate exactly what matters, nothing more, nothing less.
All of these components work together to finally answer: “What do I actually need to validate to ensure my data is right?”
Let’s walk through how we built each piece, starting with the realization that changed everything: not all changes are created equal.
Breaking change analysis
The first breakthrough came when we realized we needed to understand not just what changed, but how it changed. This led us to categorize breaking changes into three types that would drive everything else we built.
- Non-breaking change: No impact to downstream since no change in existing columns, e.g. adding a new column.
- Partial breaking change: Impact to some, but not all downstream columns since only some columns change.
- Breaking change: Impact to the whole downstream model since the whole model changes, e.g. modified
WHERE
,JOIN
,GROUP BY
, orORDER BY.
But when we shipped this as a toggle on the lineage, something felt off. Users expected “Breaking Change Analysis” to show them what was broken, meaning impacted downstream models. Instead, they saw models with code changes highlighted.
This confusion made sense once we understood the terminology gap. We borrowed this idea from software, but data people understand it differently.
- In software development: Breaking changes are intentional improvements. You plan them, document them, and manage the transition.
- In analytics engineering: Breaking changes are unplanned problems. Your dashboard breaks, your pipeline fails, and you need to fix it immediately.
This mismatch between expectation and reality made us realize we were missing a crucial piece: understanding exactly how upstream changes affect downstream models.
Dependency analysis
This confusion led us to our next breakthrough: we needed to map the relationships between models and columns at a much more granular level.
It was straight forward to map out four dependency types.
- Model-to-Model
- Model-to-Column
- Column-to-Column
- Column-to-Model
After testing it ourselves, we realized something unintuitive: tracking Column-to-Model dependency is pointless. Any column can get hit by a model-level breaking change upstream, so really, a column’s Column-to-Model link is just the same as the Model-to-Model link for its parent model. Why make life harder?
When viewing the lineage tab on Recce, we used to describe the relations from upstream to downstream, for dependency analysis we needed to describe the relation from downstream to upstream. It’s not easy to discuss these relationships by name only. Internally, we confused ourselves when we used Model-to-Model, Column-to-model to communicate.
This naming confusion taught us something important: there are two directions we view in lineage.
- Lineage direction: upstream flows to downstream (left to right)
- Dependency direction: downstream depends on upstream (right to left)
When we say Model-to-Model dependency, we mean the downstream model depends on the upstream model, right to left. When we say Column-to-Model dependency, we mean the downstream column depends on the upstream model, right to left.
1. Model-to-Model dependency
This is the dependency most data tools understand. When you reference a model, everything downstream depends on it. If upstream_model
changes, downstream_model
needs validation.
--- Downstream #1 downstream model depeneds on upstream model
SELECT a, b
FROM {{ ref("upstream_model") }}
2. Column-to-Column dependency
This is where it gets interesting. Not every column in a downstream model depends on every column in the upstream model. We can track dependencies at the column level.
Column-to-Column dependency means the downstream column is a direct projection (or transformation) of the upstream column. The downstream column’s value is derived from the upstream column through SELECT
operations.
We’ve clarified there are five transformation types to understand how each column is generated or modified.
--- Downstream #2 downstream column depends on upstream column
SELECT
a,
b AS b2
FROM {{ ref("upstream_model") }}
Here, downstream_model.a
depends on upstream_model.a
, and downstream_model.b2
depends on upstream_model.b
. If upstream_model.b
changes, only downstream_model.b2
is affected, not downstream_model.a
.
This is the key insight that enables partial breaking changes, some but not all columns are changed. Looking at the column level, we can track impact at each column instead of assuming the entire model is affected.
3. Model-to-Column dependency
Sometimes a single upstream column affects an entire downstream model, even when that column doesn’t appear in the SELECT
clause.
Model-to-Column dependency occurs when an upstream column is used in WHERE
, JOIN
, GROUP BY
, or ORDER BY
clauses. Changes to this column affect which rows are included in the downstream model, impacting the entire model’s output.
--- Downstream #3 downstream model depeneds on an upstream column
SELECT
a
FROM {{ ref("upstream_model") }}
WHERE b > 0
In this case, downstream_model
depends on upstream_model.b
even though column b
doesn’t appear in the final output. If the values in column b
change, the entire downstream model needs validation because different rows might be included or excluded.
Applying this to Recce’s demo PR, you can see customers
model depends on stg_orders.status
due to using it in the WHERE
clause.
With these three dependency types mapped out, we finally had the missing pieces.
Impact Radius v3
Armed with both column level breaking change analysis and column level dependency mapping, we could finally answer the original question with surgical precision: “What do I actually need to validate to ensure my data is right?”
Impact Radius shows which downstream models are affected by your changes and helps you analyze impacts at the column level. The scope depends on which modified model you’re analyzing.
Each model has an impact type:
- Not impacted: no impact to the whole model.
- Partially impacted: some columns in the model are impacted.
- Fully impacted: the whole model is impacted.
For example, in this demo, customer_order_pattern
has different impact types:
- Impact Radius from
customers
: impact type ofcustomer_order_pattern
= fully impacted - Impact Radius from
stg_payments
: impact type ofcustomer_order_pattern
= not impacted - Impact Radius from all changed models: impact type of
customer_order_pattern
= fully impacted
Now that we understand how impact types work and how the same model can have different impacts depending on the source, let’s see this in action.
Use Impact Radius
Let’s walk through real scenarios and answer “what do I actually need to validate to ensure my data is right?”
Example 1: When dependencies create full Impact from partial changes
A simple 2-model pipeline where a column modification in the upstream model affects the entire downstream model.
What Changed:
upstream_model
: Modified columnc
(partial breaking change)SELECT a, b, -- c ++ coalesce(c, 0) as c FROM {{ ref("raw_model") }}
downstream_model
: No changes to the model itselfSELECT a, COUNT(c) as c2 FROM {{ ref("upstream_model") }} WHERE c > 2 GROUP BY a
Impact Radius from all changed model (which equals to Impact Radius from upstream_model
here)
- The
upstream_model.c
is in the impact radius since it’s a partial breaking change. - The whole
downstream_model
is in the impact radius since it’s fully impacted. This is a Model-to-Column dependency: the modified columnc
is used in theWHERE
clause, affecting which rows are included and potentially changing all output data.
Answer to “What do I actually need to validate?”
Validate the upstream_model.c
column and entire downstream_model
.
Example 2: When dependencies create no Impact from changes
Another simple 2-model pipeline where a column modification in the upstream model doesn’t affect the downstream model.
What Changed:
upstream_model
: Modified columnc
(partial breaking change)
SELECT
a,
b,
-- c
++ coalesce(c, 0) as c
FROM {{ ref("upstream_model") }}
downstream_model
: No changes to the model itself
SELECT
a,
b
FROM {{ ref("upstream_model") }}
WHERE b > 0
Impact Radius from all changed models (which equals to Impact Radius from upstream_model
here)
- The
upstream_model.c
is in the impact radius since it’s a partial breaking change. - The whole
downstream_model
is not in the impact radius since it’s not impacted. No dependency exists between the modified columnc
anddownstream_model
: the downstream model doesn’t use columnc
.
Answer to “What do I actually need to validate?”
Validate only the upstream_model.c
column.
Example 3: Dependencies connect multiple change types
A 3-model pipeline showing how dependencies connect different change types, and how a downstream model’s impact type depends on all upstream changes.
What Changed:
upstream_model_1
: Add aWHERE
clause (breaking change)
SELECT
a,
b
FROM {{ ref("raw_model") }}
++ WHERE b > -1
upstream_model_2
: Rename a column (partial breaking change)
SELECT
a,
-- b
++ b as b2
FROM {{ ref("upstream_model_1") }}
downstream_model
: No changes to the model itself
SELECT
a
FROM {{ ref("upstream_model_2") }}
WHERE b2 > 0
Impact Radius from all changed models,
upstream_model_1
is in the impact radius as the source of breaking changeupstream_model_2
is in the impact radius since it’s fully impacted due to Model-to-Model dependency fromupstream_model_1
(breaking change affects entire downstream model)downstream_model
is in the impact radius since it’s fully impacted due to Model-to-Column dependency. It uses columnb2
in theWHERE
clause, andb2
carries the impact from the upstream breaking change through the dependency chain
Answer to “What do I actually need to validate?”
Validate the whole upstream_model_1
, upstream_model_2
and downstream_model_1
. Especially check the upstream_model_2.b2
since it’s a partial breaking change.
Now that we understand these relationships that make Impact Radius v3 possible, the next challenge was the UX. How can we show these complex concepts, the modified, impacted, dependency and transformation types, on one lineage view?
UX challenges
Building the technical foundation was only half the battle. When applying these ideas to real data, you quickly get a user interface overwhelmed with information. We faced three major UX challenges that threatened to make Impact Radius too complex for users to adopt.
Challenge 1: Visual complexity
We wanted to show Impact Radius clearly by displaying models, columns, and the lines between them. But when you try to show everything at once, it becomes overwhelming.
In this lineage view, you can see:
stg_payments
andcustomer_segments
are non-breaking changes;customers
is a breaking change.- Lines between
customers
tocustomer_segments
andcustomer_order_pattern
show breaking changes impact both downstream models fully. - Lines between
customers.customer_lifetime_value
tocustomer_segments.customer_lifetime_value
andcustomer_segments.value_segments
show column-level dependencies.
The lineage becomes too complicated to notice such details at a glance.
Challenge 2: It’s a mixed of modified and impacted
In the above lineage, you see what are modified with breaking change analysis and what are impacted. There are too many lines to view the dependencies, so you need to think about how many nodes you should validate.
Impact Radius from all changed models,
- Only the
stg_payments.coupon_amount
is in the impact radius since it’s a new column. - The
customers
,customer_segments
andcustomer_order_pattern
are in the impact radius sincecustomers
is a breaking change and affects these two downstream models. - The
finance_revenue
is in the impact radius since it’s a new model.
Answer to “What do I actually need to validate?”
- four models:
customers
,finance_revenue
,customer_segments
,customer_order_patten
. Especially these six columns:customers.customer_lifetime_value
,customer_segments.customer_lifetime_value
,customer_segments.value_segment
,customers.net_customer_lifetime_value
,customer_segments.net_customer_lifetime_value
,customer_segments.net_value_segment
- one column:
stg_payments.coupon_amount
It certainly reduces what needs to be validated: It was six models for all modified and downstream models. However, you need to think through the result of breaking change analysis and dependency analysis with the whole DAG.
We know it’s not showing ONLY impacted models and columns, but we think now it’s a balance to provide information for users. We’d like to update when we have more feedback.
Challenge 3: Navigation between views
We realized you needed to drill down from the overview to specific relationships. When you click customers.customer_lifetime_value
, the upstream and downstream relations become clear:
But how do you easily switch between these views? It’s natural to see the Impact Radius of all changed models, click one model or column to zoom in, then zoom out to see all changed models again. The navigation flow was a major challenge.
Our solution: Impact Radius and navigation
After several internal discussions, we agreed on creating a new name “Impact Radius” to represent the areas that show what is impacted, with clear navigation patterns:
- Default view: the
state:modified+
The default view shows modified models and their downstream models, like Impact Radius v1. - Impact Radius button: full precision
Click “Impact Radius” to see Impact Radius for all changed models. This is Impact Radius v3 with precise column-level impact. - Right-click for model-specific impact
Right-click a modified model and choose “Show Impact Radius” to see that model’s specific impact. - Use column-level linage to view column-specific impact
In the column-level lineage, you can view the downstream impact of this column. Though you also view the upstream. - Direct navigation
Since the lineage can only represent one view at a time, we made direct navigation intuitive:- Click a model → see model lineage
- Click a column → see column lineage
- Return navigation
Click “X” to return to the default view. We’re planning additional solutions to enable returning to previous views (e.g., from Impact Radius ofcustomers
back to Impact Radius of all changed models).
There are assumptions behind this navigation design. We’re waiting for user feedback after Impact Radius v3 adoption, and we’ll adjust the UX accordingly. The key principle: progressive disclosure, start simple, drill down when needed, always provide a clear path back.
Impact Radius is live
This is the whole journey of building Impact Radius from v1 to v3: the passion, struggle, confusion, and finally arriving at a satisfying solution to present.
Try it now: Upgrade to Recce v1.10.0 with pip install recce -U
, explore our interactive demos to experience Impact Radius, or hop on a call to learn more.
It’s just the beginning. We want user feedback and judgments. (Give us feedback [email protected]) This feature took us months to build, and we know we can make it better to help data teams answer: “What do I actually need to validate to ensure my data is right?”
What’s next
Remember the original problem we set out to solve? Users felt the “wow” moment seeing our default state:modified+
lineage, but then worried about the complexity and cost of validating models.
Impact Radius v3 directly addresses these concerns by showing precisely what needs validation. But, how do you actually use this in your daily workflow?
In our next article, we’ll share examples of how you can take advantage of Impact Radius to solve your data validation problems. We’ll explore the real validation workflows Impact Radius enables and when column-level precision increases confidence in deployments while reducing costly over-validation.
Impact Radius represents our biggest bet on making data validation less costly through precision. The technical foundation is solid, the UX principles are clear, and now we’re excited to see how data teams use it to transform their validation practices.
Stay tuned and enter email below to subscribe our newsletter.
Join our journey to ship correct data with confidence.