Your Data Doesn’t Need To Be AI-Ready
Table of Contents
In This Article
“We need to get our data AI-ready,” is far and away the most common refrain we hear from data + AI leaders.
This somewhat amorphous concept has gradually been defined by Gartner and others as a three step process:
- Migrate remaining databases to the cloud and create a single source of truth
- Govern, model, and document key datasets
- Enforce high-levels of data quality

And there are no objections here! There was even a time way back in 2023 when this was just called “good data governance.” The difference now is these steps will be the difference between organizations that thrive, survive, and nosedive in the agentic era.
Unfortunately, many teams are repeating the same mistakes that caused so many data governance initiatives to crash and burn.
Namely, treating AI-readiness as a large monolithic multi-year box that must be checked before they even consider meaningful AI or agent deployments. AI-ready data is a goal, not a gate.
So let’s dive into a few AI-ready pitfalls as well as the best practices we’ve seen successful data + AI teams leverage to find success.
In This Article
Why you can’t afford to wait
As AI-ready initiatives start to creep into their second or third year, the risk of failure expands as well. You can trust your own experience with large monolithic initiatives, or take it from Boston Consulting Group research:
Ten years is a long time to show no improvement. But that’s exactly what’s happened with the ability of companies to implement large-scale tech programs since we last looked in 2015. Of the more than 1,000 companies surveyed in our latest research, two-thirds have mounted large-scale tech programs, which we define as involving more than 3% of the annual tech budget. (See “About Our Research.”) Such programs can involve investments of as much as €1 billion over their duration. But more than two-thirds are not expected to be delivered on time, within budget, or within their planned scope.
Here are a few quick reasons why AI-ready initiatives are no different.
1. It’s never ready
Our colleague has a saying, “data is like fashion, it’s never done, it just evolves.”
You don’t have to reach far back for an example here–just look at the explosion of interest in vector databases in the last two years. There is always a data architecture concept coming and going. We seem to be in a perpetual state of coupling and decoupling, consolidating and federating.

Source: Conscious Decoupling: How Far Is Too Far For Storage, Compute, And The Modern Data Stack?
Since data is always evolving, migrations are a fact of life, governance policies are constantly in flux, and data quality issues are inevitable. Anyone waiting for perfect data is going to be waiting a while, and the first word that comes to our mind regarding C-Suites and AI initiatives is not “patient.”
2. You don’t know what AI-ready means
AI-Readiness, just like governance, is defined by the use case.
Let’s take data quality as an example. A machine learning app may need very timely but directionally accurate data, while a finance dashboard may need to be accurate to the penny but reconciled on a weekly basis.
Yes, there are central standards that can and should be enforced–PII leakage is never a good idea–but the best way to define AI-Readiness is not through those central standards but by each use case.
And these use case requirements are impossible to completely develop hypothetically in a lab. In other words, to get AI-ready you need to push agents into production before you are “AI-Ready.”
For example, for one online retailer it is mission critical that all customer facing AI outputs are in their brand voice. While that voice was defined, ironing out exactly what that meant in production took a few iterations.
Counter-intuitive? Yes. Effective? Also yes.
3. Failure is a good teacher
Whether you are AI-ready or not, your first few AI and agent applications are likely to fail. Pick whatever study you’d like –RAND, MIT, or Wakefield– the reality is most initiatives are not driving the desired value.
This is not unique or a cause for concern. In the early days, most machine learning applications failed as well.
Proactive data + AI leaders need to set expectations with leadership and factor in the price of iterating. Keep in mind that very, very few organizations started with AI teams–they were cobbled together from data science teams and painful lessons learned the hard way.

When we spoke to one director of engineering about their AI initiatives, they candidly said, “It’s not so much the ROI. I think it’s more like we’ve got a bunch of learnings from it. I think it gives us a foothold to build on top of it.”
Failure will accelerate AI initiatives more than the perfect AI-ready data estate.
AI-ready data best practices
So if abstract large-scale initiatives around data governance are not the best way to getting AI-ready, what should data + AI teams do? We have a few thoughts.
1. Start with what’s already AI-ready
While all of your data doesn’t need to be AI-ready, you do need some data to be AI-ready in order to have a competitive advantage and reduce your models’ propensity to hallucinate.
And chances are excellent you already have some, even without enacting that three year five-point plan from the consultant agency that became experts in AI yesterday.
Your customer facing data is likely the most effective place to start. This data should (emphasis on should) already have higher standards of governance and reliability put in place. The reliability standards should mirror the reliability standards of your product.
As a bonus, the data within your product already has validated value and distribution meaning adoption is more likely. Just take a look at some of the most successful agent deployments we have seen in the last year:
- Digests customer specific data within the platform and makes recommendations for how auto-dealers can better manage and promote their inventory.
- Recommends travel destinations based on a few keywords based on past customer history within the platform.
- Forecasts and makes sophisticated financial recommendations based on signals collected within the platform.
We also learned this lesson ourselves when our own agent became customer facing. Troubleshooting Agent is an AI agent that can help troubleshoot data issues. Whenever users get an alert from Monte Carlo, the agent can automatically work through hundreds of hypotheses and highlight the ones that are most likely to have caused the issue.
It will consider changes in the data, system issues (e.g. Airflow or dbt failures) and code changes when analyzing an issue, and will automatically traverse your lineage to identify the root cause.

The agent references context already surfaced within the Monte Carlo UI. Barr likes to joke that we’ve been building the Troubleshooting Agent for 6 years.
As a result, it has been a highly adopted workflow. And while there were a few early hiccups in development, the data powering the agent is highly reliable and among the most AI-ready data we had at our disposal.
2. Pilot and platform in parallel
Move forward with a minimally viable product is not revolutionary advice, but it is still worth repeating.
OpenAI explicitly talks about the value of iteration in their Practical Guide to Building AI Agents, saying “While it’s tempting to immediately build a fully autonomous agent with complex architecture, customers typically achieve greater success with an incremental approach.”
The other failure mode is the opposite side of the coin. We’ve also spoken with dozens of data teams with dozens even hundreds of agents stuck in development or low-level production.

It’s smart to hedge your bets and accelerate learnings, but at a certain point you need to productionize and platformatize.
Teams that have made the most progress in these areas are those that are developing their platforms and pilots in parallel (but not in isolation). In other words, they are moving forward with agents that may be developed using their own architectures and frameworks, but they are also leveraging learnings to develop their future platform and beginning to transition agents towards these shared structures.
As a VP of Engineering and ML Ops told us, “It would be crazy to have a completely different set of infrastructure for every [product]. As the platform organization, we’re providing a set of common frameworks and tools that support patterns within AI…and how it integrates with the rest of our systems.”
We don’t know exactly what the future of data + AI will be, but we do know what it won’t be. The future is not every agent with its own unique and siloed pipelines, embeddings, and orchestration frameworks.
A national media company is already ahead of the game here for example. They have consolidated their AI input and output pipelines within their BigQuery warehouse. They are leveraging shared, multi-use embeddings for use cases such as article tagging and audience segmentation. When the time comes, pilots grow up and become production-grade with the infrastructure, governance, and monitoring processes to match.
Yes, there is some tech debt that needs to be resolved in this process, but such is life.
3. Do the tough stuff sooner rather than later
The point of this article is not to say “building reliable well governed systems is hard and takes too long, just go ahead and skip it.”
If anything the message is the opposite. Organizations can’t wait until their estate is perfect, but they also need to roll up their sleeves and do the work now before it’s too late. This means investing in people, processes, and technology.
The teams that invested in their data foundations were the teams best positioned to take advantage of the AI era. The teams that invest in their data + AI foundations will be the best positioned to take advantage of whatever is next. Because whatever the future holds, we’re certain that it involves well governed, reliable systems.
Our promise: we will show you the product.