Why High Quality Data Can Still Be Wrong

When data passes the test, but fails the business

Data quality is often used as a catch-all term for rating ‘good data’ but, in practice, data quality is more concerned with the structural integrity and completeness of data, rather than the correctness of the data.

What is data correctness, and why does it matter?

Data correctness refers to whether the data is ‘right,’ meaning it is fit-for-purpose in the context of real-world requirements and business processes. It’s possible for data to be of high quality (structurally complete), but fundamentally incorrect.

It’s a scary prospect that data can pass all tests and yet still be ‘wrong’. This is why silent errors are often deployed to production and no one knows until it’s too late.

The historical context test for data correctness

A proactive approach to data correctness is required to ensure that data remains semantically correct. ‘Remain’ is the key word here. We generally trust in historic data because it has passed the test of time that the numbers are correct. Therefore, a good benchmark for data correctness is that historical metrics did not change following a data modeling update.

Only checking data for structural integrity will come back to bite you.

Firefighting after-the-fact

Here’s a real world example from a team that now knows the value of ensuring data correctness.

The data team pushed incorrect data into a reverse ETL model that was powering their marketing automation. The data came from a core model that was technically “tested,” but the bug was a logical one, something a standard data test like schema or null check would not have caught. So it slipped through unnoticed.

Corrupted data was included in key reports

The corrupted data reached the experimentation platform and wasn’t discovered until almost a full week later, by that time the damage was already included in key reports.

The process that follows is the stuff of nightmares:

They had to halt the pipeline and pause financial data updates.
The team had to ask downstream consumers to delete the bad data from their systems and allow it to be re-sent.
Fixing the model took a full day.
Cleaning up the experimental platform took several more days.

Nothing looked obviously broken

What made the issue tricky was that nothing looked obviously broken at a glance. It only became apparent when they started calculating metrics and noticed that patterns didn’t make sense.

Immeasurable loss of trust

During this whole process the data experimentation team had to wait before they could continue working. The monetary cost to business was significant, but for the data team, the loss of trust was immeasurable. The whole situation could have been avoided by simply cross referencing against known good production metrics before shipping.

Data Correctness as a way of working

Structural quality and data completeness should never be overlooked; they are the foundation of any data quality framework. But if data correctness issues slip through, they undermine business decisions at the core. In data-critical applications the consequences can be devastating.

To catch these issues before they reach production, data teams need a proactive workflow that goes beyond traditional testing:

Cross-referencing new metrics against known-good historical data before shipping.
Assessing and validating the impact of changes on downstream data models.
Collaborating with stakeholders and analysts to confirm expectations before deployment.

These practices form the basis for ensuring data correctness and they should be an integral part of your overall data quality assurance strategy.

Explore data impact with Recce

Recce provides a proactive solution to data correctness by facilitating mindful, contextual workflows for data-change validation. With Recce you can:

Compare development data to trusted production data.
Understand which models and metrics will be impacted by data model changes.
Run correctness tests during development and as part of CI/CD.
Curate checklists of data tests for review.
Share test results with stakeholder and team members.

Recce ensures your data is correct by helping you catch silent errors before deployment. With Recce in your toolkit, your data team can ship changes with confidence while maintaining velocity.

Why High Quality Data Can Still Be Wrong

When data passes the test, but fails the business

What is data correctness, and why does it matter?

The historical context test for data correctness

Firefighting after-the-fact

Corrupted data was included in key reports

Nothing looked obviously broken

Immeasurable loss of trust

Data Correctness as a way of working

Explore data impact with Recce

Get Recce updates in your inbox

Interested in data best practices and Recce usage tips?

Sign Up for Recce Updates and Data Engineering Insights