What Is Guided Data Review for dbt Pull Requests?
Why Data PR Reviews Are Uniquely Difficult
Every data engineer knows the feeling: you open a pull request, see a lineage diff with 5+ impacted models, and freeze. Data PR review is fundamentally harder than code review because a single upstream change can cascade through dozens of downstream models, metrics, and dashboards. The reviewer has to answer not just “is this code correct?” but “what did this change do to the actual data?”
Without guidance, reviewers default to one of two bad patterns. They either check everything (wasting hours running unnecessary diffs) or check almost nothing (hoping dbt tests will catch problems). Neither approach scales, and both leave teams exposed to data quality regressions that pass tests but still produce wrong results.
What Is Guided Data Review?
Guided data review is an approach that provides reviewers with an intelligent, context-aware summary of what changed in a data PR, why it matters, and what specific validations they should perform before merging. Rather than leaving reviewers to figure out where to start, guided review meets them where they already work: in the pull request itself.
The concept emerged from a key observation: data teams do not start their review workflow inside a validation tool. They start in the PR. Even teams using open-source Recce had hacked together ways to post a Recce Summary to every PR, showing which checks ran and their results. The problem was that this summary only showed results after someone had already done the work. What reviewers needed was guidance before they started.
How Does Context Engineering Power Guided Review?
The difference between a useful AI review summary and a generic one comes down to context engineering versus simple prompt engineering.
| Approach | Input | Output Quality |
|---|---|---|
| Prompt engineering | PR description, commit messages | Generic summaries that restate the PR description |
| Context engineering | dbt artifacts, metadata dependencies, tool access for data checks, domain knowledge | Specific, data-driven, actionable guidance |
With context engineering, an AI agent receives:
- dbt artifacts: The metadata and dependency graph for both base and PR branches
- Tool access: The ability to run Recce checks such as value diffs, profile diffs, and custom queries
- Analysis capabilities: Impact radius calculation and downstream impact tracing
- Domain knowledge: Understanding of dbt, data warehousing, and analytics engineering patterns
This enables the agent to actually perform checks rather than just suggest them. It can tell a reviewer: “The customer_lifetime_value column decreased by 15% because returned orders are now excluded. Here are the downstream models affected.” That is actionable. A prompt-engineered summary saying “this PR modifies CLV calculations” is not.
Who Benefits from Guided Data Review?
Different team members need different things from a review summary, and this is one of the core challenges guided review addresses.
| Reviewer Persona | What They Need |
|---|---|
| Developer reviewing own work | Confirmation that intended changes look correct, flag anything unexpected |
| Teammate reviewing a PR | Quick assessment of merge safety, specific checks to run |
| Team lead managing multiple PRs | High-level risk assessment, which PRs need deeper investigation |
| Early-stage team | What new insights does this PR unlock? |
| Mature production team | What could break? What anomalies need investigation? |
Guided review adapts to these personas by providing layered information: a quick yes/no merge recommendation with supporting evidence, plus detailed check results for those who want to dig deeper. This approach connects directly to broader data review best practices that emphasize scoping validation to what actually matters.
How Does Guided Review Fit into the dbt Workflow?
Guided data review integrates at the PR stage of the standard dbt development cycle:
- Develop: Engineer modifies dbt models locally
- Create PR: Code changes trigger metadata generation
- Guided review appears: An AI-assisted summary posts as a PR comment, showing what changed in the data, the impact radius of changes, and recommended validations
- Validate: Reviewers follow the guidance to run targeted checks
- Merge: Team merges with confidence that impacts are understood
The key innovation is step 3. Instead of a blank canvas where the reviewer must figure out what to check, they receive a structured starting point grounded in actual data analysis.
What Makes Guided Review Different from Checklists and Templates?
Teams have tried several approaches to structure data PR reviews: in-app bubble guides, rule-based suggestion engines, user preference settings, and template-based checklists. These approaches share a common weakness: they are rigid and cannot adapt to the unique context of each PR.
An in-app guide tells you the same three steps regardless of whether your PR touches one staging model or restructures an entire mart layer. A checklist cannot know that this specific change reduces customer lifetime value by 15% and downstream dashboards will show different segment distributions.
Guided review powered by context engineering adapts to each PR because it actually analyzes the change, runs checks against real data, and synthesizes findings based on what the specific change impacts.
Getting Started with Guided Data Review
Recce Cloud ships guided data review as an AI-assisted PR summary. When a PR is created against a dbt project connected to Recce, the agent analyzes the change and posts a comment summarizing what changed, what the data impact looks like, and what the reviewer should validate.
For teams already doing dbt CI checks beyond basic tests, guided review is the natural next step: it turns your CI metadata into actionable review intelligence instead of raw check output that reviewers must interpret on their own.
Frequently Asked Questions
- What is guided data review for dbt?
- Guided data review is an approach that uses context engineering and AI to automatically analyze dbt pull requests, telling reviewers what data changed, why it matters, and what they should validate before merging.
- Why do data PR reviews need guidance?
- Data PR reviews are uniquely difficult because a single upstream code change can impact 5 or more downstream models. Reviewers see a lineage diff but have no systematic way to decide where to start validating, leading to missed issues or wasted time on irrelevant checks.
- What is the difference between prompt engineering and context engineering for data review?
- Prompt engineering gives an LLM only PR descriptions and commit messages. Context engineering provides dbt artifacts, metadata dependencies, tool access to run actual data checks like profile diffs and value diffs, and domain knowledge about data warehousing, producing far more actionable review summaries.
- How does Recce implement guided data review?
- Recce posts an AI-assisted summary as a PR comment that describes what changed in the data, why it matters, and what the reviewer should check. The agent uses context engineering to access dbt artifacts, run Recce checks, calculate impact radius, and trace downstream impacts.