When the Model Isn’t the Problem: How Data Gaps Undermine AI Systems
Table of Contents
AI quality issues are on the rise and data + AI leaders are just beginning to feel the pain. One of the most common perpetrators?
Data quality issues.
At Monte Carlo, we’re no strangers to the impact of data quality—particularly at the scale and complexity of AI applications. However, we recently experienced that impact first-hand—and learned a valuable lesson about the nature of AI reliability in the process.
In this article, we’ll examine an internal case study involving our own troubleshooting agent to demonstrate why data quality is one of the biggest risks to production AI—and why you need end-to-end observability to solve it.
Table of Contents
Examining Monte Carlo’s Observability Agents
To help our customers drive trust and adoption for their own AI agents, Monte Carlo is developing a suite of sophisticated agentic systems that accelerate reliability workflows for data + AI teams.
Collectively known as our observability agents, these features currently comprise a monitoring agent that accelerates monitor creation by identifying and recommending new monitors (now in general availability), and a troubleshooting agent that accelerates root cause analysis by helping data + AI teams understand why a break happened and offering suggestions to resolve it (currently in preview and approaching release).
How Our Agentic Architecture Works
Our troubleshooting agent specifically was built as a multi-level hierarchical agent network. At the core of the agent is a centralized “brain”—a high-level reasoning model that integrates information from specialized sub-agents.
These topic-specific specialists work by investigating hypotheses and can deploy their own sub-agents to analyze particular information sources—including pull requests, query changes, data anomalies, and more—to provide a quick overview of events and potential causes.
This process can be understood as follows:
- Level 2 and 3 agents use function calls to access Monte Carlo’s internal databases and retrieve telemetry data.
- The agents then aggregate, analyze, and relay their findings back to the reasoning model.
- The reasoning model synthesizes the findings to determine leading hypotheses.
- And the central agent presents the user with a comprehensive explanation of all findings as a response.
A Failed Response Launched a Multi-Day Investigation
Like any ML/AI system, Monte Carlo’s observability agents depend on high-quality data to be successful. If the data is late or erroneous, the output will reflect it.
During our internal testing phase, we deployed our troubleshooting agent to investigate a variety of internal incidents that would test its performance, then solicited our data + AI team for feedback to assess the model’s behavior and accuracy.
In one particular feedback exchange, a data engineer who had researched and resolved a specific incident reported that our model had correctly identified the cascading effects in the data warehouse, but had failed to identify the root cause—a code change—that actually caused the issue.
This exchange immediately launched a comprehensive multi-day investigation into the model to discover the source of the problem. During the investigation, we:
- Examined the model’s chain of thought reasoning process
- Reviewed all calls to sub-agents for errors
- Analyzed the prompt for potential flaws
- And checked the model’s parameter settings for oversights
Unfortunately, after days of investigating and debugging, nothing seemed amiss in the model’s design or execution.
How a data gap led to incomplete analysis
The breakthrough came when we discovered that the sub-agent tasked with reviewing code changes on Github had never even attempted to analyze the breaking pull request. What was happening here?
As we dug deeper, we realized this wasn’t a model issue at all—the problem lay in our data sources.
An internal data export had silently failed, which meant that the sources being utilized by our models lacked the appropriate context about the pull request. The model couldn’t analyze something it didn’t know existed.
Once we backfilled the missing data and re-ran the model, it promptly identified the culprit, analyzed it, and correctly reported it as the root cause.
So—what can you take away from this? Simple. If you want to deliver reliable agents, you need to invest in more than your models.
Why You Need to Invest in More Than Your Models
As data scientists, we often contend with the problem of “garbage in, garbage out,” — but the reality is, what doesn’t go in can be just as dangerous as what does.
Missing data has been plaguing BI reports for years. And in those scenarios, upstream data alerts have served as both early warning signs and effective root cause indicators.
However, the introduction of AI has extended that impact to model processes and generative responses as well—leading many data + AI leaders to mistakenly focus on the models and their subsequent monitors in a silo—and perpetuating unreliable outputs in the process.
Monitoring Your Models Isn’t Enough
Model monitoring isn’t wrong—its certainly a component of reliable AI. The mistake is believing that model monitoring is sufficient to deliver reliable AI in and of itself.
Monitoring your models in a silo falls short for two important reasons:
- Evaluating LLM outputs for accuracy is always difficult, but particularly when the model hasn’t explicitly hallucinated.
- Even if you detect an error, the issue can’t always be traced back to the model itself.
The reality is, the ways a data + AI system can break are as complex as the workflows themselves. Sometimes what looks like hallucination could actually be an issue with the data, the system, or something else entirely.
No amount of monitoring your models can uncover that level of complexity, let alone resolve it efficiently.
Building Better Models Isn’t Enough Either
Another common approach taken by AI engineers is to simply build more robust models that can handle some missing data.
While this is also an important safeguard, it falls short for several reasons:
- Inherent limitations of models: Even the most sophisticated models can’t reliably infer information that isn’t there. In the case of our troubleshooting agent, no amount of model improvement could have identified a pull request that wasn’t in the data.
- Cost inefficiency: Building redundancies and complex handling for every possible data gap scenario would make models and agent architectures significantly more complex, more expensive, and more difficult to maintain.
- Ambiguity in real-world scenarios: When data is missing or faulty, a model can either make an inference (potentially introducing errors) or acknowledge the gap. While theoretically functional, neither approach provides the accuracy or the user experience that simply having the correct data would offer.
- Downstream cascading effects: Data issues often create complex patterns of problems that even robust models would struggle to disambiguate without visibility into the original data failure.
Despite all the innovation we’ve seen in the last 3 years, the most efficient and reliable solution continues to be ensuring data quality at the source.
What Can You Do To Improve Your AI Reliability Today?
So, if you can’t rely on the model to deliver more reliable outputs, what can you do?
Here are 5 concrete steps you can take to enhance reliability for your AI applications:
- Implement comprehensive data + AI observability: Deploy monitoring at each stage of the pipeline to detect issues before they reach AI systems. Leverage agent observability to monitor that outputs are fit for purpose.
- Observe model outputs: Establish clear connections between AI agent evaluations and model inputs through data lineage to trace issues back to inputs and the pipelines generating them.
- Establish cross-functional incident response: Create protocols that bring together data engineers, AI/ML engineers, data scientists, and domain experts to resolve incidents promptly.
- Test with synthetic data gaps: Proactively validate how your AI systems behave with missing or incorrect data.
- Document critical data dependencies: Map which data sources are essential for which AI functionalities and prioritize monitoring accordingly.
The Path Forward: End-to-End Data + AI Observability
After hundreds of conversations with data + AI leaders, I’m convinced that data quality remains one of the preeminent hurdles to AI implementation—and one of the single greatest problems for data + AI teams to solve in the next 12 months.
That’s why we’re developing a comprehensive solution that monitors the entire data + AI system end-to-end—from data ingestion and transformation to unstructured data quality, model operations, and output monitoring.
Data and AI are no longer two separate systems—they’re one and the same—and if we want our AI applications to be reliable, we’ll need to start managing them that way.
Trustworthy AI always starts with trustworthy data. By leveraging data + AI observability tools to address data reliability as a core component of your AI strategy—instead of an afterthought—you’ll resolve issues faster, drive model performance higher, and build essential trust in your AI applications (and the data + AI teams that support them).
So, what are you waiting for?
Our promise: we will show you the product.