Why is my dbt data wrong when all tests pass?

dbt tests validate structural properties — not-null constraints, uniqueness, accepted values, referential integrity. They do not validate whether the data is semantically correct for your business. A logical bug in a calculation, an incorrect JOIN condition, or a filter that silently excludes valid records will pass all structural tests while producing wrong results. These semantic errors require comparing actual data output against a known-good baseline.

What is the difference between data quality and data correctness?

Data quality measures structural integrity and completeness — are columns the right type, are there nulls where there should not be, are values within expected ranges. Data correctness measures whether the data is right for the business — do the numbers make sense, do metrics match reality, would a domain expert agree the output is accurate. Data can be high quality (structurally sound) but fundamentally incorrect (wrong business logic).

How do you catch silent data errors before production?

Compare development data against your production baseline before merging. Use data diffs to check whether metrics changed as expected. Cross-reference new calculations against known-good historical data. Involve domain experts in reviewing high-stakes changes. Automate impact checks on critical models in CI. The key is validating actual data output, not just structural constraints.

What should I check beyond dbt tests in my CI pipeline?

Beyond dbt tests, your CI pipeline should include schema diff (detect unexpected column changes), row count diff (catch data loss or duplication), profile diff (spot statistical distribution shifts), and targeted value diffs on critical models. Tools like Recce can automate these checks via preset configurations that run on every PR, complementing dbt tests with data-level validation.

Why Is My dbt Data Wrong Even When Tests Pass?

February 16, 2026 data-qualitydbtbest-practices

The False Sense of Security

Your dbt CI pipeline is green. All tests pass — not-null, unique, accepted_values, relationships. You merge the PR. Two days later, a stakeholder messages: “The revenue numbers look wrong.”

This scenario is more common than most data teams admit. Data correctness — whether the data is right for the business — is fundamentally different from data quality, which measures structural integrity like completeness, format, and uniqueness. Data can be high quality and still be wrong.

What dbt Tests Actually Check

dbt’s built-in tests validate structural properties:

Test Type	What It Checks	What It Misses
`not_null`	Column has no null values	Whether the non-null values are correct
`unique`	No duplicate values in a column	Whether the values themselves are right
`accepted_values`	Values fall within a defined set	Whether the distribution across values makes sense
`relationships`	Foreign keys reference valid parents	Whether the join logic produces correct results

Custom tests and packages like dbt-expectations extend this to statistical checks (e.g., column means within bounds), but they still validate against predefined rules. They cannot catch a bug you didn’t anticipate.

Why Semantic Errors Slip Through

Semantic errors are logical mistakes that produce structurally valid but meaningfully wrong data. They pass all tests because the output looks fine at a structural level.

Common examples:

Incorrect filter logic — a WHERE clause that silently excludes valid records. Row counts and uniqueness are fine, but the data is incomplete.
Wrong JOIN condition — a join that fans out rows or drops records. The output has the right columns and no nulls, but metrics are inflated or deflated.
Calculation bugs — business logic that uses the wrong column, wrong aggregation, or wrong date range. The result is a number, just not the right number.
Upstream changes — a column’s meaning changes in a source system. Your tests still pass because the format hasn’t changed, but the semantics have.

In each case, the data “passes the test but fails the business.”

A Real-World Example

A data team pushed a change to a core model that fed a reverse ETL pipeline powering marketing automation. The model was fully tested — schema checks, null checks, uniqueness constraints. But the bug was a logical one: an incorrect filter that subtly changed which records were included in a calculation.

The corrupted data reached the experimentation platform and wasn’t discovered for almost a week. The aftermath:

Pipeline halted, financial data updates paused
Downstream consumers had to delete bad data and re-ingest
Model fix took a full day; platform cleanup took several more
The experimentation team was blocked the entire time

Nothing looked obviously broken at a glance. It only became apparent when someone calculated metrics and noticed patterns that didn’t make sense. The monetary cost was significant, but the loss of trust was immeasurable.

How to Catch What Tests Miss

The gap between testing and correctness requires a different approach: comparing actual data output against a known-good baseline.

Cross-Reference Against Production

Before merging, compare your development environment’s data against production. If historical metrics changed when they shouldn’t have, something is wrong. This is the historical context test — trusted production data serves as your benchmark for correctness.

Use Data Diffs at the Right Granularity

Data diffs compare datasets between environments. Start with cheap structural checks (has the schema changed? did row counts shift?), move to statistical checks (are column distributions still reasonable?), and drill into row-level comparisons only where the signal warrants deeper investigation.

For the example above, a profile diff would have shown the CLV distribution shifting. A value diff would have quantified that 99% of rows changed in the affected column while all other columns matched 100%.

Automate Checks on Critical Models

Every project has models where being wrong is expensive — customer-facing tables, revenue metrics, models that feed ML pipelines. Identify these critical models and automate data checks in your CI pipeline so they run on every PR, not just when someone remembers.

Involve Domain Experts Strategically

Not every change needs human review. Focus human attention on changes where the cost of being wrong is high and detection time is slow. For a marketing-critical model, ask: “Would a stakeholder notice if these numbers shifted by 5%?” If the answer is “not until next month’s report,” that model deserves a human-in-the-loop review.

Building a Data Correctness Workflow

A practical workflow combines automated checks with targeted human review:

dbt tests — catch structural issues (they’re still essential)
Automated data diffs in CI — schema, row count, and profile checks on critical models
Manual exploration on high-risk changes — use lineage to scope impact, run targeted diffs, check distributions
PR documentation — record what you checked, what you found, and why the change is safe to merge
Domain review on high-stakes changes — get a second pair of eyes when the business impact is significant

This layered approach follows the data review best practices that catch issues at the right level. Tests are the foundation, but they’re not the whole building.

Summary

dbt tests validate that data meets structural constraints — not that it’s correct for the business. Semantic errors (wrong calculations, incorrect filters, unexpected upstream changes) pass all tests while producing wrong results. To catch these issues, compare development data against production baselines using data diffs, automate checks on critical models in CI, and involve domain experts on high-stakes changes. The goal is not to replace testing but to complement it with data-level validation that catches what tests inherently cannot.

Frequently Asked Questions

Why is my dbt data wrong when all tests pass?: dbt tests validate structural properties — not-null constraints, uniqueness, accepted values, referential integrity. They do not validate whether the data is semantically correct for your business. A logical bug in a calculation, an incorrect JOIN condition, or a filter that silently excludes valid records will pass all structural tests while producing wrong results. These semantic errors require comparing actual data output against a known-good baseline.
What is the difference between data quality and data correctness?: Data quality measures structural integrity and completeness — are columns the right type, are there nulls where there should not be, are values within expected ranges. Data correctness measures whether the data is right for the business — do the numbers make sense, do metrics match reality, would a domain expert agree the output is accurate. Data can be high quality (structurally sound) but fundamentally incorrect (wrong business logic).
How do you catch silent data errors before production?: Compare development data against your production baseline before merging. Use data diffs to check whether metrics changed as expected. Cross-reference new calculations against known-good historical data. Involve domain experts in reviewing high-stakes changes. Automate impact checks on critical models in CI. The key is validating actual data output, not just structural constraints.
What should I check beyond dbt tests in my CI pipeline?: Beyond dbt tests, your CI pipeline should include schema diff (detect unexpected column changes), row count diff (catch data loss or duplication), profile diff (spot statistical distribution shifts), and targeted value diffs on critical models. Tools like Recce can automate these checks via preset configurations that run on every PR, complementing dbt tests with data-level validation.