Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less – NBC 5 Dallas-Fort Worth

OpenAI CEO expresses concerns about GPT-5, says he is ‘scared’ of it

‘Subliminal learning’: Anthropic uncovers how AI fine-tuning secretly teaches bad habits

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

Positron believes it has found the secret to take on Nvidia in AI inference chips — here’s how it could benefit enterprises

By Advanced AI EditorJuly 29, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

As demand for large-scale AI deployment skyrockets, the lesser-known, private chip startup Positron is positioning itself as a direct challenger to market leader Nvidia by offering dedicated, energy-efficient, memory-optimized inference chips aimed at relieving the industry’s mounting cost, power, and availability bottlenecks.

“A key differentiator is our ability to run frontier AI models with better efficiency—achieving 2x to 5x performance per watt and dollar compared to Nvidia,” said Thomas Sohmers, Positron co-founder and CTO, in a recent video call interview with VentureBeat.

Obviously, that’s good news for big AI model providers, but Positron’s leadership contends it is helpful for many more enterprises beyond, including those using AI models in their workflows, not as service offerings to customers.

“We build chips that can be deployed in hundreds of existing data centers because they don’t require liquid cooling or extreme power densities,” pointed out Mitesh Agrawal, Positron’s CEO and the former chief operating officer of AI cloud inference provider Lambda, also in the same video call interview with VentureBeat.

The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF

Venture capitalists and early users seem to agree.

Positron yesterday announced an oversubscribed $51.6 million Series A funding round led by Valor Equity Partners, Atreides Management and DFJ Growth, with support from Flume Ventures, Resilience Reserve, 1517 Fund and Unless.

As for Positron’s early customer base, that includes both name-brand enterprises and companies operating in inference-heavy sectors. Confirmed deployments include the major security and cloud content networking provider Cloudflare, which uses Positron’s Atlas hardware in its globally distributed, power-constrained data centers, and Parasail, via its AI-native data infrastructure platform SnapServe.

Beyond these, Positron reports adoption across several key verticals where efficient inference is critical, such as networking, gaming, content moderation, content delivery networks (CDNs), and Token-as-a-Service providers.

These early users are reportedly drawn in by Atlas’s ability to deliver high throughput and lower power consumption without requiring specialized cooling or reworked infrastructure, making it an attractive drop-in option for AI workloads across enterprise environments.

Entering a challenging market that is decreasing AI model size and increasing efficiency

But Positron is also entering a challenging market. The Information just reported that rival buzzy AI inference chip startup Groq — where Sohmers previously worked as Director of Technology Strategy — has reduced its 2025 revenue projection from $2 billion+ to $500 million, highlighting just how volatile the AI hardware space can be.

Even well-funded firms face headwinds as they compete for data center capacity and enterprise mindshare against entrenched GPU providers like Nvidia, not to mention the elephant in the room: the rise of more efficient, smaller large language models (LLMs) and specialized small language models (SLMs) that can even run on devices as small and low-powered as smartphones.

Yet Positron’s leadership is for now embracing the trend and shrugging off the possible impacts on its growth trajectory.

“There’s always been this duality—lightweight applications on local devices and heavyweight processing in centralized infrastructure,” said Agrawal. “We believe both will keep growing.”

Sohmers agreed, stating: “We see a future where every person might have a capable model on their phone, but those will still rely on large models in data centers to generate deeper insights.”

Atlas is an inference-first AI chip

While Nvidia GPUs helped catalyze the deep learning boom by accelerating model training, Positron argues that inference — the stage where models generate output in production — is now the true bottleneck.

Its founders call it the most under-optimized part of the “AI stack,” especially for generative AI workloads that depend on fast, efficient model serving.

Positron’s solution is Atlas, its first-generation inference accelerator built specifically to handle large transformer models.

Unlike general-purpose GPUs, Atlas is optimized for the unique memory and throughput needs of modern inference tasks.

The company claims Atlas delivers 3.5x better performance per dollar and up to 66% lower power usage than Nvidia’s H100, while also achieving 93% memory bandwidth utilization—far above the typical 10–30% range seen in GPUs.

From Atlas to Titan, supporting multi-trillion parameter models

Launched just 15 months after founding — and with only $12.5 million in seed capital — Atlas is already shipping and in production.

The system supports up to 0.5 trillion-parameter models in a single 2kW server and is compatible with Hugging Face transformer models via an OpenAI API-compatible endpoint.

Positron is now preparing to launch its next-generation platform, Titan, in 2026.

Built on custom-designed “Asimov” silicon, Titan will feature up to two terabytes of high-speed memory per accelerator and support models up to 16 trillion parameters.

Today’s frontier models are in the hundred billions and single digit trillions of parameters, but newer models like OpenAI’s GPT-5 are presumed to be in the multi-trillions, and larger models are currently thought to be required to reach artificial general intelligence (AGI), AI that outperforms humans on most economically valuable work, and superintelligence, AI that exceeds the ability for humans to understand and control.

Crucially, Titan is designed to operate with standard air cooling in conventional data center environments, avoiding the high-density, liquid-cooled configurations that next-gen GPUs increasingly require.

Engineering for efficiency and compatibility

From the start, Positron designed its system to be a drop-in replacement, allowing customers to use existing model binaries without code rewrites.

“If a customer had to change their behavior or their actions in any way, shape or form, that was a barrier,” said Sohmers.

Sohmers explained that instead of building a complex compiler stack or rearchitecting software ecosystems, Positron focused narrowly on inference, designing hardware that ingests Nvidia-trained models directly.

“CUDA mode isn’t something to fight,” said Agrawal. “It’s an ecosystem to participate in.”

This pragmatic approach helped the company ship its first product quickly, validate performance with real enterprise users, and secure significant follow-on investment. In addition, its focus on air cooling versus liquid cooling makes its Atlas chips the only option for some deployments.

“We’re focused entirely on purely air-cooled deployments… all these Nvidia Hopper- and Blackwell-based solutions going forward are required liquid cooling… The only place you can put those racks are in data centers that are being newly built now in the middle of nowhere,” said Sohmers.

All told, Positron’s ability to execute quickly and capital-efficiently has helped distinguish it in a crowded AI hardware market.

Memory is what you need

Sohmers and Agrawal point to a fundamental shift in AI workloads: from compute-bound convolutional neural networks to memory-bound transformer architectures.

Whereas older models demanded high FLOPs (floating-point operations), modern transformers require massive memory capacity and bandwidth to run efficiently.

While Nvidia and others continue to focus on compute scaling, Positron is betting on memory-first design.

Sohmers noted that with transformer inference, the ratio of compute to memory operations flips to near 1:1, meaning that boosting memory utilization has a direct and dramatic impact on performance and power efficiency.

With Atlas already outperforming contemporary GPUs on key efficiency metrics, Titan aims to take this further by offering the highest memory capacity per chip in the industry.

At launch, Titan is expected to offer an order-of-magnitude increase over typical GPU memory configurations — without demanding specialized cooling or boutique networking setups.

U.S.-built chips

Positron’s production pipeline is proudly domestic. The company’s first-generation chips were fabricated in the U.S. using Intel facilities, with final server assembly and integration also based domestically.

For the Asimov chip, fabrication will shift to TSMC, though the team is aiming to keep as much of the rest of the production chain in the U.S. as possible, depending on foundry capacity.

Geopolitical resilience and supply chain stability are becoming key purchasing criteria for many customers — another reason Positron believes its U.S.-made hardware offers a compelling alternative.

What’s next?

Agrawal noted that Positron’s silicon targets not just broad compatibility but maximum utility for enterprise, cloud, and research labs alike.

While the company has not named any frontier model providers as customers yet, he confirmed that outreach and conversations are underway.

Agrawal emphasized that selling physical infrastructure based on economics and performance—not bundling it with proprietary APIs or business models—is part of what gives Positron credibility in a skeptical market.

“If you can’t convince a customer to deploy your hardware based on its economics, you’re not going to be profitable,” he said.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAnthropic reportedly nears $170B valuation with potential $5B round
Next Article Autonomous vehicle revival is fueling demand for training and simulation solutions — here are the companies gaining momentum
Advanced AI Editor
  • Website

Related Posts

‘Subliminal learning’: Anthropic uncovers how AI fine-tuning secretly teaches bad habits

July 31, 2025

LangChain’s Align Evals closes the evaluator trust gap with prompt-level calibration

July 31, 2025

Runloop lands $7M to power AI coding agents with cloud-based devboxes

July 30, 2025

Comments are closed.

Latest Posts

Person Dies After Jumping from Whitney Museum

Trump’s ‘Big Beautiful Bill’ Orders Museum to Relocate Space Shuttle

Thomas Kinkade Foundation Denounces DHS’s Usage of Painting

Three Convicted for Stealing Ancient Celtic Coins from German Museum

Latest Posts

Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less – NBC 5 Dallas-Fort Worth

July 31, 2025

OpenAI CEO expresses concerns about GPT-5, says he is ‘scared’ of it

July 31, 2025

‘Subliminal learning’: Anthropic uncovers how AI fine-tuning secretly teaches bad habits

July 31, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Alibaba-backed Moonshot releases new Kimi AI model that beats ChatGPT, Claude in coding — and it costs less – NBC 5 Dallas-Fort Worth
  • OpenAI CEO expresses concerns about GPT-5, says he is ‘scared’ of it
  • ‘Subliminal learning’: Anthropic uncovers how AI fine-tuning secretly teaches bad habits
  • Kleiner Perkins-backed Ambiq pops on IPO debut
  • Stability AI releases next-gen open-source Stable Diffusion 3.5 text-to-image AI model family

Recent Comments

  1. casino mirror on Former Tesla AI czar Andrej Karpathy coins ‘vibe coding’: Here’s what it means
  2. 🔏 Security - Transfer 1.8 BTC incomplete. Fix here >> https://graph.org/OBTAIN-CRYPTO-07-23?hs=85ce984e332839165eff00f10a4fc17a& 🔏 on The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies (Paper Explained)
  3. 💾 System: Transfer 0.5 Bitcoin incomplete. Verify now >> https://graph.org/OBTAIN-CRYPTO-07-23?hs=e1378433e58a7b696e3632102c97ef63& 💾 on Qwen 2.5 Coder and Qwen 3 Lead in Open Source LLM Over DeepSeek and Meta
  4. 📞 Security; Transaction 0.5 BTC failed. Verify now => https://graph.org/OBTAIN-CRYPTO-07-23?hs=ec8b72524f993be230f3c8fd50d7bbae& 📞 on OpenAI Five: Dota Gameplay
  5. 📨 System: Transfer 0.5 Bitcoin on hold. Verify now => https://graph.org/OBTAIN-CRYPTO-07-23?hs=b25dab3fe579278f363cd6d123369e86& 📨 on New ChatGPT voice mode updates ⬇️

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.