The progression from AI experimentation to production-grade enterprise deployment has consistently stumbled on a fundamental challenge – data architecture. While organizations invest heavily in models and use cases, the underlying data infrastructure often becomes the limiting factor that prevents AI initiatives from scaling beyond proof-of-concept stages.
An industry roundtable featuring technical leaders from IBM, Snowflake, and Salesforce highlighted how zero copy architecture is emerging as a critical enabler for enterprise AI transformation. The discussion revealed that traditional approaches to data unification – characterized by extensive copying, movement, and reconstruction of datasets – are fundamentally incompatible with the requirements of agentic AI systems.
The data afterthought problem
Edward Calvesbert, Vice President of Product Management for IBM’s watsonx platform, identified a pattern that resonates across enterprise AI implementations:
The biggest challenge for gen AI initiatives and progressing them from experimentation to production is really having data as an afterthought. So much of the attention on the initial focus is on the models, of course, on the use cases, which is critical as well. But data is somewhat of an afterthought to that conversation.
This points to a deeper architectural problem. Many AI projects begin with retrieval-augmented generation (RAG) systems built on small, curated knowledge bases – technical documentation or internal policies that are universally accessible within an organization. The simplicity of these initial implementations masks the complexity that emerges when projects attempt to incorporate additional content types, confidential information, and enterprise-scale datasets. Calvesbert explained:
When you start getting into additional types of content, additional types of questions and answers, potentially confidential information within the organization, the data management problem just gets a lot more complicated.
The challenge intensifies when similarity searching, which works adequately for limited document sets, encounters the scale requirements of millions of documents.
Zero copy architecture represents a fundamental shift from data gravity – where analytics must move to where data resides – to data fluidity, where data can be accessed in place without physical movement or duplication. Saptarshi Mukherjee, Director of Product Management at Snowflake, outlined why this architectural approach has become essential for agentic AI systems:
The foundational shift that is taking place with agentic AI is when human beings looked at data and took decision for the business, there was a limited cognitive space and capacity in an enterprise. With agentic AI, agents are able to perform workflows 24/7, looking at data much more expansively.
This expanded data access creates new security and governance challenges. Unlike human analysts who operate within cognitive limitations, AI agents can potentially access vast enterprise datasets continuously. Zero copy architecture addresses this by ensuring that “source systems are on point to make sure there is governance and security in place, there is RBAC models and access control that these agents have to adhere to.”
The enterprise AI challenge extends beyond mere data access to encompass business context preservation and real-time responsiveness. Mukherjee emphasized that AI solutions require more than raw data:
They need to understand the business context. When you’re building a RAG application, when you’re building an AI solution that is very specific to an industry, you’re looking at both data and the metadata.
Re-constructing this metadata in systems designed purely for AI consumption creates bottlenecks that zero copy architecture aims to eliminate. The approach enables business context and metadata to propagate alongside data without requiring reconstruction in downstream systems.
Real-time data access has also taken on new significance since agentic AI has hurtled onto the scene. Traditional business intelligence operated on batch processing cycles where 12-hour data latency was acceptable for dashboard and reporting use cases. Agentic systems, however, must “react to enterprise events as they are taking place,” requiring sub-second response times for many workflows.
Governance in a zero copy world
The governance implications of zero copy architecture represent both an opportunity and a challenge for enterprise implementations. Narinder Singh, VP of Product Management at Salesforce, highlighted how zero copy preserves data ownership and control:
With zero copy, the benefits come with respect to the data ownership and control and managing the lifecycle of the data, and having always access to the latest data.
This preservation of source system governance eliminates the need to reconstruct security models in downstream analytics platforms. As Mukherjee explained through a practical example involving automobile service center optimization:
We had to preserve and propagate the ACLs for the RAG applications, to honor the governance and security model. What zero copy does is an AI agent in Snowflake, for example, is accessing data in Salesforce. We don’t have to reconstruct that security and the governance model.
However, zero copy implementation must address the intersection of different governance paradigms. Calvesbert noted the complexity that arises when unstructured data governance – typically organized around document-level permissions – meets structured data policies designed to maximize access while protecting specific sensitive elements.
The practical implementation of zero copy architecture relies heavily on open standards and interoperability protocols. The roundtable participants emphasized the role of formats like Apache Iceberg and emerging communication protocols in enabling cross-platform data access without duplication.
Calvesbert positioned these developments within a broader architectural vision: “I really think it’s definitional Lake House architecture. I really think it communicates the value proposition of open table formats.” The benefits extend beyond technical implementation to address cost, compliance, and agility requirements that drive enterprise technology decisions.
The Model Context Protocol (MCP) emerged as a significant discussion point, though the panelists cautioned against viewing it as a complete solution. “If you look at how a lot of the data vendors are implementing MCP, it’s really just a different protocol for traditional kind of JDBC and SQL,” Calvesbert observed. The real value lies in semantic enrichment and use case-specific data aggregation that enables large language models to identify and utilize appropriate tools within defined workflow contexts.
Enterprise zero copy implementations face several common misconceptions that can derail projects. Singh identified a prevalent assumption that data unification alone solves AI readiness:
Often, the common misconception I’ve seen is that we help solve all of the problem, make the data AI ready, and we should be able to get to the outcomes that we want to achieve.
This approach can create additional data silos rather than eliminating them, particularly when organizations pursue centralized data lake strategies without addressing fundamental issues around quality, consistency, and semantic harmonization. The result can be “more data chaos in terms of having multiple copies of data or having multiple ways in which agents are able to look at the data.”
Where to begin?
For organizations beginning their zero copy journey, the panelists emphasized the importance of establishing clear architectural principles before pursuing incremental implementations. Calvesbert stressed the value of articulating a clear destination:
The most important thing is to have a clear vision of the destination… a good time horizon, five plus years.
This architectural vision should encompass commodity cloud object storage, open data and table formats, shared metadata catalogs, and governance systems that can be centrally designed but locally enforced. These principles provide a framework for evaluating incremental decisions across different lines of business and projects.
Singh advocated for a use case-driven approach:
Always start small and grow from there… starting with a specific use case, start with that use case around which you layer your data strategy.
This methodology allows organizations to build foundational capabilities while demonstrating tangible business value.
Zero copy architecture represents more than a technical optimization – it embodies a fundamental shift in how enterprises architect their data infrastructure for AI-driven operations. The technology aims to resolve the tension between data accessibility and governance that has constrained enterprise AI implementations.
As Mukherjee concluded:
Unification, propagation of context. These are very foundational principles… You’re going to solve a problem that’s going to require these different data products to come together, to bring your AI solutions to life.
The success of zero copy implementations will ultimately depend on organizations’ ability to balance technical innovation with practical governance requirements, ensuring that agentic AI systems can access the data they need while operating within appropriate security and compliance boundaries.
My take
Zero copy architecture represents more than technical evolution – it embodies the maturation of enterprise AI from experimental curiosity to operational necessity. The roundtable discussion highlighted a critical disconnect, that while enterprises rush to deploy generative AI capabilities, many are discovering that their data infrastructure was designed for human-scale analysis, not 24/7 autonomous agents operating across enterprise datasets.
The governance challenge is particularly acute. Traditional data management assumes human oversight and cognitive limitations that naturally constrain access patterns. Agentic systems demolish these assumptions, creating new requirements for granular, context-aware access controls that can operate at machine speed while preserving business intent.
What’s most interesting about the zero copy approach is its recognition that data architecture and AI strategy are inseparable. Organizations that continue to treat data infrastructure as a secondary consideration will find themselves building impressive AI demonstrations that cannot scale beyond proof-of-concept stages. This roundtable indicates that data foundations need to be structured for agentic consumption from the outset, which could be a daunting prospect for companies that are reluctant to change the way they think about data.