Skip to content
Data Quality Updated Oct 22 2025

Data Quality Statistics & Insights From Monitoring +11 Million Tables in 2025

The 6 most common root causes of bad data quality
AUTHOR | Michael Segner

Every day more than 1,000 data quality incidents are resolved in the Monte Carlo data + AI observability platform. Tens of thousands of users are monitoring millions of tables across hundreds of data + AI platforms.

This gives us a unique insight into the most common and most effective approaches teams take to deliver reliable data + AI systems. Adopting these data quality statistics best practices is critical to minimize the large impact of bad data + AI on the business and trust.

And it’s a big problem. How big? Well…

The Data Reliability Problem

We looked deep into our data to see how teams could estimate how many significant data reliability issues they may suffer in a year. 

As you can imagine, the bigger your environment, the more issues you can anticipate. When we’ve measured this in the past (2020 and 2023) the ratio has held remarkably steady over the years from about 1 data quality issue for every 15 tables in your environment per year. 

Today, it’s an average of 1 data quality issue for every 10 tables in your environment per year.

Interested in calculating your data downtime? Or the amount of time in which the business is potentially disrupted due to incomplete or inaccurate data? Combine the formula above with a statistic from our 2023 survey of 200 data professionals revealing that it takes on average 15 hours or more to resolve an incident once discovered and you get:

(Tables/10) x 15 hours = total data downtime

If you want to dive in deeper, check out the calculator we built with Forrester to estimate how much bad data is costing your team.

What causes poor data quality?

Monte Carlo’s Troubleshooting Agent automatically identifies the root cause of data quality incidents. It has been run thousands of times across hundreds of customer environments.

When we dig into that telemetry to examine the most common root causes of poor data quality, it’s clear that the problem isn’t just one thing — it’s an ecosystem of fragility across pipelines, platforms, and human processes. Pipeline execution faults top the list at 26.2%, underscoring how often missed schedules, failed tasks, and broken permissions still derail data reliability.

Real-world variation follows closely behind at 20%, a reminder that not all anomalies are errors — sometimes, business or behavioral shifts change the data itself. Intentional changes, including activities like data backfilling, comes in at 14.2%. This means that approximately 34% of data quality incidents actually aren’t incidents. Being able to quickly identify and weed out these cases can make data + AI teams more efficient.

The next tier of causes — ingestion disruptions (16.6%) and platform instability (15.2%) — point to the inherent brittleness of enterprise data engineering. From connector outages to compute contention, these issues reveal how dependent teams are on the availability and coordination of cloud and vendor systems.

Schema drift (7.8%) highlights the ongoing tension between agility and control. As teams move fast to update systems or adapt metrics, structure and integrity can break in subtle but impactful ways. In short: data downtime isn’t just about errors — it’s about complexity, and our collective struggle to manage it gracefully.

This creates downtime, which we of course we all know poor data reliability costs organizations millions of dollars (right?). The next important question now is: how can teams minimize it?

Incident Management (Triage & Resolution)

There are some key principles teams can implement to dramatically improve their time to respond and fix incidents. 

These include:

  • Not sending too many alerts to a single channel
  • Using the right channels for the right incidents
  • Designating incident owners and empowering power users; and
  • Having leadership regularly review data health trends with the team

Let’s dive into the data.

Alert Routing

No organization has spent more energy on studying alert fatigue than Monte Carlo. If teams become conditioned not to respond to alerts then the value of your data reliability solution plummets. 

Our data shows the status update rate on alerts drops about 15 percent once a notification channel is receiving more than 50 alerts per week. If a channel gets more than 100 alerts per week then the engagement rate drops another 20%.

data quality statistics

Optimizing incident response isn’t just about sending alerts to different channels, but the right channels. Sending alerts to persistent chat channels like Slack have a 4x higher click through rate than email. 

This communication channel is just more conducive to collaboration, visibility, and accountability. Stop sending alerts to email! If that user is on PTO you might as well route the alert to a blackhole.

What’s even more interesting, and caught me a bit by surprise, was that alerts sent to incident management tools like JIRA or OpsGenie had a 10% higher clickthrough rate than Slack. This makes sense as these are likely the highest priority alerts being sent to dedicated, on-call incident management teams.

Incident Severity

Speaking of the highest priority alerts, the changing behavior of how teams designate incident severity over time is particularly interesting. As teams get started, everything seems to be on fire. The percentage of incidents marked as Sev 1 is 32%. 

The problem with this is that when everything is urgent, nothing is urgent. 

As teams get more mature and start to formally define severity levels, human subjectivity is removed, and incidents become more evenly spread across severity levels with Sev 1s now accounting for 18% of all incidents.

Incident Collaboration

Is data quality a team sport? The answer is absolutely, but the nuance is that like any other team sport, all stars or power users can have an outsized impact.

In statistics it’s hard to escape the palmetto distribution or 80/20 rule. We see this in data reliability incident management as well. 20 percent of users escalate 80% of the alerts to incidents. 

Most alerts only have one pair of eyes on them, and the time to resolve was actually longer when more people were involved. This initially surprised me, but it makes sense when you consider that issues that are easier to solve will be closed out immediately by the first responder while more difficult issues will be escalated across the team.

Another counterintuitive finding from our data is that when an incident owner is designated, the average time to respond is about 1.5x faster, however the time to resolve is slower on average. 

Our hypothesis is that designating owners increases accountability and accelerates response times, however owners are typically assigned to more serious incidents which require more work to reach resolution.

Executive Involvement

Executives, do you want your team to be better at engaging with alerts and responding to issues? One of the most effective ways to do this is to engage with them. 

Team leaders that have deep dashboard sessions reviewing data reliability, quality and health status have teams with about 1.5x better status update rates. 

Anecdotally, I can also report that the teams with the strongest incident response processes and culture all have strong executive involvement. JetBlue for example, has a formal process where each alert is reviewed by leaders on a bi-weekly basis.

Monitoring

One of the most effective ways for data teams to improve the efficiency of their teams is to leverage anomaly detection as much as possible. 

The average number of updates for anomaly detection monitors is almost 40% less than custom SQL rules or static data validations. I believe this corresponds to less time required to maintain these monitors since anomaly detection monitors update their thresholds automatically.

Monitor TypeAverage Touches
Custom SQL2.35
Data Validation (test)2.03
Comparison 1.63
Anomaly Detection1.33

However, if a team takes the time to create a custom data quality monitor they are more likely to find these alerts meaningful as shown by a 30% higher feedback rate as compared to anomaly detection monitors.

Another interesting insight we can glean from our product data is where teams are placing monitors. Data analysts tend to place monitors at the gold or publishing layer right before consumption. In comparison, data engineers are more likely to place monitors at the bronze or raw layer to validate the data as it lands.

The most effective monitoring strategies are end-to-end. This greatly expedites the root cause analysis process by making it easier to determine the point of origin of an incident. 

Within Monte Carlo we can see that 34% of monitors are placed in the initial landing layer, 26% of monitors are placed in the gold layer and 50% of monitors are placed in the middle of a pipeline

This makes sense as there are generally more tables within the middle layer of a pipeline and Monte Carlo promotes this behavior by allowing users to automatically deploy freshness, volume, and schema monitors end to end across selected pipelines or data products.

Building a Culture of Reliability With Data + AI Observability

Troubleshooting isn’t just about fixing issues in the moment—it’s about creating the systems, processes, and culture that prevent small glitches from becoming big problems. 

By thoughtfully managing alerts, defining clear severity levels, empowering incident owners, and engaging leadership, teams can dramatically reduce both the frequency and impact of data incidents. 

These teams also need to be equipped with the right tools to get the job done. As we saw from the data above, if they are just drowned in alerts without any context, engagement rates drop and the business continues to suffer disruption from bad data.Data + AI observability solutions like Monte Carlo can reduce this disruption by 80 percent or by correlating specific issues to their root cause so they can be fixed quickly. To see how, check out the 90 second video of our Troubleshooting Agent below or set up a time to speak with us.

Our promise: we will show you the product.