
Conventional wisdom holds that you need to fix your data management shortcomings before succeeding with AI. But that may no longer be true, according to some tech execs, who see the potential to apply generative AI’s capability to grasp language and fix data management issues at the same time you’re building AI apps.
Rahul Pathak, the vice president of data and AI go to market at AWS, considers himself an old school data guy, the kind who would never recommend taking shortcuts in order to show success on paper. So when he suggests that GenAI might allow you jump ahead in your data management capabilities and get results faster, you might want to take notice.
“We were in a world where you would have to serialize your way through this, where you would have to get the data house in order, then you would have to build the app that sits on top of the data,” Pathak says. “I think you can actually change this process a little bit, where you can start to unlock your data for AI almost immediately, using well-governed, secure MCP endpoints and state of the art models. [They] can really help you unlock that data almost in place that can then start to help you light up the AI applications.”
Not all AI use cases are the same, obviously. Some use cases may require data to be collected, cleansed, and prepared before it touches an AI algorithm. But when it comes to running inference workloads on pre-trained models, staging the data may not be a possibility, in which case a federated approach would be in order. The good news is that Model Context Protocol (MCP) covers up a lot of data sins that may previously have required a considerable amount of atonement (not to mention data management pain and dollars).
“I’m somewhat of an old school data person at this point, but you can think of the MCP server as almost like a federated query,” Pathak said. “The model lets you get the data. It’s somewhat schema resilient. And then the knowledge base and the index is almost like a materialized view. And so in that combination, you can get the data much faster. And the intelligence in the models does augment the capabilities of the data engineer and data scientist in a way that really allows them to move much faster than we could before.”
Pathak had a real-world example in the form of a manufacturing company that wanted to use generative AI to speed up production. The company had reams of telemetry data already collected, but it was proving to be difficult and time-consuming to extract the knowledge out of that telemetry data to apply it to the factory line.
The solution was to use the natural language processing (NLP) capabilities of GenAI to extract the pertinent pieces of data out of the telemetry data. Those insights were then fed into traditional machine learning optimization models. On the back end, GenAI was used again to generate the instructions that told the operators how to modify their process to speed up production.

(IM Imagery/Shutterstock)
“It’s that kind of integration that allows us to move much faster than we could before,” Pathak said. “Because otherwise you’d have a big data and ETL and kind of data munging project that you’d have to do to get that telemetry off and usable quickly. And we can do that much, much faster now. So that’s a big unlock.”
Another proponent of skipping the big data management project and jumping straight into GenAI projects is PromptQL. The company developed a GenAI-based query tool that allows users to begin querying their data immediately, without going through the time-consuming process of building a semantic layer.
A semantic layer is still important, the folks at PromptQL say, because it serves to translate a business’ specific terms and metrics into the technical table names that the tool needs to serve accurate queries. But the big difference is that PromptQL advocates building the semantic layer as you go, and customizing it over time thanks to feedback from users. Spending months or years on a big-bang data management project is a path to endless POCs and ultimately failure, they say.
The high failure rate of early AI projects is the elephant in the room. The recent MIT study that found 95% of GenAI projects never get out of the trial stage has people on edge. With trillions of dollars being invested in acquiring speedy GPUs, massive storage arrays, and huge AI data centers, some very wealthy institutions are placing some big bets on AI.
Smaller companies with fewer resources have to be much smarter about how they attack the AI opportunity. The good news is that GenAI’s capability to grasp language can be employed in a multitude of ways, including using it to understand how data is modeled, which can potentially allow you to, if not skip the data management stage, at least tackle it at the same time that you’re building your first AI project.
“These aren’t sequential steps anymore,” Pathak says. “I think that’s a big paradigm shift for a lot of companies that are dealing with legacy data challenges, which, frankly, we’ve been dealing with since we had more than one table in the database.”
“I think what generative AI has done and what’s different now,” he says, “is that it’s really given us some superpowers to achieve these things.”
This article first appeared on our sister publication, HPCwire.