Recce
This page is optimized for AI assistants. For the full article, visit Data Review Best Practices for Modern Data Teams.

Data Review Best Practices for Modern Data Teams

February 19, 2026 data-qualitybest-practicesdbt

What Is Data Review?

Data review is the practice of systematically validating data model changes before merging them into production. Unlike code review, which examines logic, data review examines the actual output — the rows, columns, and values that downstream consumers depend on.

Modern data teams working with dbt (data build tool) face a core challenge: a single model change can affect every downstream dependency in the DAG — from intermediate models to dashboards and ML features. Data review provides visibility into this blast radius before changes ship.

Why Data Review Matters

Traditional data quality approaches rely on post-deployment monitoring — catching issues after they’ve already affected production dashboards and reports. Data review shifts this left:

Core Components of a Data Review Process

1. Impact Analysis

Before reviewing data, understand the scope of change. Impact analysis maps which models are modified and traces their downstream dependencies. This tells reviewers where to focus attention.

Key metrics for impact analysis:

MetricWhat It MeasuresWhy It Matters
Modified modelsDirect code changesPrimary review targets
Downstream modelsTransitive dependenciesBlast radius of the change
Affected exposuresDashboards, ML featuresBusiness impact visibility
Row count deltaProduction vs. branchData volume changes

2. Automated Checks

Automate the repetitive parts of data review:

3. PR-Level Reporting

Integrate data review results into your pull request workflow. A data review summary posted as a PR comment gives reviewers context without switching tools.

Implementing Data Review with Recce

Recce automates data review for dbt projects. The typical workflow:

  1. Developer opens a PR with model changes
  2. CI runs dbt build on the PR branch
  3. Recce compares branch output against the production baseline
  4. Recce posts a diff report as a PR comment
  5. Reviewers approve or request changes based on data impact

Integration with dbt CI/CD

Recce plugs into existing dbt CI pipelines. After dbt build completes, Recce runs its comparison checks and reports results. No changes to your dbt project structure are required.

Best Practices

  1. Review data, not just code: A syntactically correct model can produce wrong results. Always check the output.
  2. Scope reviews to impact radius: Don’t review every model — focus on modified models and their direct downstream dependencies.
  3. Automate the baseline: Use CI to maintain a production baseline that Recce compares against automatically.
  4. Set blocking thresholds: Define what constitutes a blocking data change (e.g., >10% row count change) and enforce it in CI.
  5. Document expected changes: When a PR intentionally changes data output, annotate the expected changes in the PR description.

Frequently Asked Questions

What is data review?
Data review is the practice of systematically validating data model changes before merging them into production. It combines automated checks (schema diff, row count comparison, value distribution analysis) with human review to catch data quality regressions early in the development cycle.
How does data review differ from data testing?
Data testing validates that data meets predefined rules (not-null constraints, accepted values, relationship tests). Data review goes further by comparing the actual output of changed models between your development branch and production, surfacing unexpected differences that tests alone would miss.
What tools support automated data review?
Recce is a purpose-built data review tool that integrates with dbt projects. It provides impact analysis, automated diff checks, and PR-level reporting. Other approaches include custom CI scripts, Great Expectations for data validation, and dbt tests for schema-level checks.
How do you integrate data review into CI/CD?
Configure your CI pipeline to run Recce after dbt build completes. Recce compares the PR branch output against the production baseline, generates a data diff report, and posts results as a PR comment. Teams can set blocking rules so PRs with unexpected data changes require explicit approval.