Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval – Takara TLDR

WIRED Roundup: The New Fake World of OpenAI’s Social Video App

IBM Adds Agentic AI to Network Intelligence

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

How Delphi stopped drowning in data and scaled up with Pinecone

By Advanced AI EditorAugust 21, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Delphi, a two-year-old San Francisco AI startup named after the Ancient Greek oracle, was facing a thoroughly 21st-century problem: its “Digital Minds”— interactive, personalized chatbots modeled after an end-user and meant to channel their voice based on their writings, recordings, and other media — were drowning in data.

Each Delphi can draw from any number of books, social feeds, or course materials to respond in context, making each interaction feel like a direct conversation. Creators, coaches, artists and experts were already using them to share insights and engage audiences.

But each new upload of podcasts, PDFs or social posts to a Delphi added complexity to the company’s underlying systems. Keeping these AI alter egos responsive in real time without breaking the system was becoming harder by the week.

Thankfully, Dephi found a solution to its scaling woes using managed vector database darling Pinecone.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Open source only goes so far

Delphi’s early experiments relied on open-source vector stores. Those systems quickly buckled under the company’s needs. Indexes ballooned in size, slowing searches and complicating scale.

Latency spikes during live events or sudden content uploads risked degrading the conversational flow.

Worse, Delphi’s small but growing engineering team found itself spending weeks tuning indexes and managing sharding logic instead of building product features.

Pinecone’s fully managed vector database, with SOC 2 compliance, encryption, and built-in namespace isolation, turned out to be a better path.

Each Digital Mind now has its own namespace within Pinecone. This ensures privacy and compliance, and narrows the search surface area when retrieving knowledge from its repository of user-uploaded data, improving performance.

A creator’s data can be deleted with a single API call. Retrievals consistently come back in under 100 milliseconds at the 95th percentile, accounting for less than 30 percent of Delphi’s strict one-second end-to-end latency target.

“With Pinecone, we don’t have to think about whether it will work,” said Samuel Spelsberg, co-founder and CTO of Delphi, in a recent interview. “That frees our engineering team to focus on application performance and product features rather than semantic similarity infrastructure.”

The architecture behind the scale

At the heart of Delphi’s system is a retrieval-augmented generation (RAG) pipeline. Content is ingested, cleaned, and chunked; then embedded using models from OpenAI, Anthropic, or Delphi’s own stack.

Those embeddings are stored in Pinecone under the correct namespace. At query time, Pinecone retrieves the most relevant vectors in milliseconds, which are then fed to a large language model to produce responses, a popular technique known through the AI industry as retrieval augmented generation (RAG).

This design allows Delphi to maintain real-time conversations without overwhelming system budgets.

As Jeffrey Zhu, VP of Product at Pinecone, explained, a key innovation was moving away from traditional node-based vector databases to an object-storage-first approach.

Instead of keeping all data in memory, Pinecone dynamically loads vectors when needed and offloads idle ones.

“That really aligns with Delphi’s usage patterns,” Zhu said. “Digital Minds are invoked in bursts, not constantly. By decoupling storage and compute, we reduce costs while enabling horizontal scalability.”

Pinecone also automatically tunes algorithms depending on namespace size. Smaller Delphis may only store a few thousand vectors; others contain millions, derived from creators with decades of archives.

Pinecone adaptively applies the best indexing approach in each case. As Zhu put it, “We don’t want our customers to have to choose between algorithms or wonder about recall. We handle that under the hood.”

Variance among creators

Not every Digital Mind looks the same. Some creators upload relatively small datasets — social media feeds, essays, or course materials — amounting to tens of thousands of words.

Others go far deeper. Spelsberg described one expert who contributed hundreds of gigabytes of scanned PDFs, spanning decades of marketing knowledge.

Despite this variance, Pinecone’s serverless architecture has allowed Delphi to scale beyond 100 million stored vectors across 12,000+ namespaces without hitting scaling cliffs.

Retrieval remains consistent, even during spikes triggered by live events or content drops. Delphi now sustains about 20 queries per second globally, supporting concurrent conversations across time zones with zero scaling incidents.

Toward a million digital minds

Delphi’s ambition is to host millions of Digital Minds, a goal that would require supporting at least five million namespaces in a single index.

For Spelsberg, that scale is not hypothetical but part of the product roadmap. “We’ve already moved from a seed-stage idea to a system managing 100 million vectors,” he said. “The reliability and performance we’ve seen gives us confidence to scale aggressively.”

Zhu agreed, noting that Pinecone’s architecture was specifically designed to handle bursty, multi-tenant workloads like Delphi’s. “Agentic applications like these can’t be built on infrastructure that cracks under scale,” he said.

Why RAG still matters and will for the foreseeable future

As context windows in large language models expand, some in the AI industry have suggested RAG may become obsolete.

Both Spelsberg and Zhu push back on that idea. “Even if we have billion-token context windows, RAG will still be important,” Spelsberg said. “You always want to surface the most relevant information. Otherwise you’re wasting money, increasing latency, and distracting the model.”

Zhu framed it in terms of context engineering — a term Pinecone has recently used in its own technical blog posts.

“LLMs are powerful reasoning tools, but they need constraints,” he explained. “Dumping in everything you have is inefficient and can lead to worse outcomes. Organizing and narrowing context isn’t just cheaper—it improves accuracy.”

As covered in Pinecone’s own writings on context engineering, retrieval helps manage the finite attention span of language models by curating the right mix of user queries, prior messages, documents, and memories to keep interactions coherent over time.

Without this, windows fill up, and models lose track of critical information. With it, applications can maintain relevance and reliability across long-running conversations.

From Black Mirror to enterprise-grade

When VentureBeat first profiled Delphi in 2023, the company was fresh off raising $2.7 million in seed funding and drawing attention for its ability to create convincing “clones” of historical figures and celebrities.

CEO Dara Ladjevardian traced the idea back to a personal attempt to reconnect with his late grandfather through AI.

Today, the framing has matured. Delphi emphasizes Digital Minds not as gimmicky clones or chatbots, but as tools for scaling knowledge, teaching, and expertise.

The company sees applications in professional development, coaching, and enterprise training — domains where accuracy, privacy, and responsiveness are paramount.

In that sense, the collaboration with Pinecone represents more than just a technical fit. It is part of Delphi’s effort to shift the narrative from novelty to infrastructure.

Digital Minds are now positioned as reliable, secure, and enterprise-ready — because they sit atop a retrieval system engineered for both speed and trust.

What’s next for Delphi and Pinecone?

Looking forward, Delphi plans to expand its feature set. One upcoming addition is “interview mode,” where a Digital Mind can ask questions of its own creator/source person to fill knowledge gaps.

That lowers the barrier to entry for people without extensive archives of content. Meanwhile, Pinecone continues to refine its platform, adding capabilities like adaptive indexing and memory-efficient filtering to support more sophisticated retrieval workflows.

For both companies, the trajectory points toward scale. Delphi envisions millions of Digital Minds active across domains and audiences. Pinecone sees its database as the retrieval layer for the next wave of agentic applications, where context engineering and retrieval remain essential.

“Reliability has given us the confidence to scale,” Spelsberg said. Zhu echoed the sentiment: “It’s not just about managing vectors. It’s about enabling entirely new classes of applications that need both speed and trust at scale.”

If Delphi continues to grow, millions of people will be interacting day in and day out with Digital Minds — living repositories of knowledge and personality, powered quietly under the hood by Pinecone.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleMicrosoft AI chief says it’s ‘dangerous’ to study AI consciousness
Next Article Takeuchi-US expands dealer, customer support with AI
Advanced AI Editor
  • Website

Related Posts

OpenAI unveils AgentKit that lets developers drag and drop to build AI agents

October 6, 2025

Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware

October 6, 2025

OpenAI announces Apps SDK allowing ChatGPT to launch and run third party apps like Zillow, Canva, Spotify

October 6, 2025

Comments are closed.

Latest Posts

Tomb of Amenhotep III Reopens After Two-Decade Renovation    

Limited Edition Print of Ozzy Osbourne Art Sold To Benefit Charities

Odili Donald Odita Sues Jack Shainman Gallery over ‘Withheld’ Artworks

Morning Links for October 6, 2025

Latest Posts

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval – Takara TLDR

October 7, 2025

WIRED Roundup: The New Fake World of OpenAI’s Social Video App

October 6, 2025

IBM Adds Agentic AI to Network Intelligence

October 6, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval – Takara TLDR
  • WIRED Roundup: The New Fake World of OpenAI’s Social Video App
  • IBM Adds Agentic AI to Network Intelligence
  • OpenAI unveils AgentKit that lets developers drag and drop to build AI agents
  • OpenAI ramps up developer push with more powerful models in its API 

Recent Comments

  1. Gigachelrin3Nalay on Mistral Releases Its Own Coding Assistant Mistral Code
  2. twistyneonpangolin5Nalay on Mistral Releases Its Own Coding Assistant Mistral Code
  3. Magistrniker4Nalay on C3 AI Awarded $13 Million Task Order to Expand Predictive Maintenance Program Across U.S. Air Force Fleet
  4. cosmicgecko1Nalay on United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool
  5. Magistrniker4Nalay on Innovaccer Rakes In $275M, Kicking Off What Will Likely Be Another Hot Year for AI Funding

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.