Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

The World’s First Car Equipped with Doubao Deep Thinking Large Model: Roewe M7 DMH Creates the ‘Smart Car That Understands Users’_user_car_memory

Secrets of Chinese AI Model DeepSeek Revealed in Landmark Paper

Huawei bypasses Nvidia AI chips in computing breakthrough for China

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

By Advanced AI EditorAugust 1, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

The rise in Deep Research features and other AI-powered analysis has given rise to more models and services looking to simplify that process and read more of the documents businesses actually use. 

Canadian AI company Cohere is banking on its models, including a newly released visual model, to make the case that Deep Research features should also be optimized for enterprise use cases. 

The company has released Command A Vision, a visual model specifically targeting enterprise use cases, built on the back of its Command A model. The 112 billion parameter model can “unlock valuable insights from visual data, and make highly accurate, data-driven decisions through document optical character recognition (OCR) and image analysis,” the company says.

“Whether it’s interpreting product manuals with complex diagrams or analyzing photographs of real-world scenes for risk detection, Command A Vision excels at tackling the most demanding enterprise vision challenges,” the company said in a blog post. 

The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF

This means Command A Vision can read and analyze the most common types of images enterprises need: graphs, charts, diagrams, scanned documents and PDFs. 

? @cohere just dropped Command A Vision on @huggingface ?

Designed for enterprise multimodal use cases: interpreting product manuals, analyzing photos, asking about charts… ❓??

A 112B dense vision-language model with SOTA performance – check out the benchmark metrics in… pic.twitter.com/ORMfM5f8cF

— Jeff Boudier ? (@jeffboudier) July 31, 2025

Since it’s built on Command A’s architecture, Command A Vision requires two or fewer GPUs, just like the text model. The vision model also retains the text capabilities of Command A to read words on images and understands at least 23 languages. Cohere said that, unlike other models, Command A Vision reduces the total cost of ownership for enterprises and is fully optimized for retrieval use cases for businesses. 

How Cohere is architecting Command A

Cohere said it followed a Llava architecture to build its Command A models, including the visual model. This architecture turns visual features into soft vision tokens, which can be divided into different tiles. 

These tiles are passed into the Command A text tower, “a dense, 111B parameters textual LLM,” the company said. “In this manner, a single image consumes up to 3,328 tokens.”

Cohere said it trained the visual model in three stages: vision-language alignment, supervised fine-tuning (SFT) and post-training reinforcement learning with human feedback (RLHF).

“This approach enables the mapping of image encoder features to the language model embedding space,” the company said. “In contrast, during the SFT stage, we simultaneously trained the vision encoder, the vision adapter and the language model on a diverse set of instruction-following multimodal tasks.”

Visualizing enterprise AI 

Benchmark tests showed Command A Vision outperforming other models with similar visual capabilities. 

Cohere pitted Command A Vision against OpenAI’s GPT 4.1, Meta’s Llama 4 Maverick, Mistral’s Pixtral Large and Mistral Medium 3 in nine benchmark tests. The company did not mention if it tested the model against Mistral’s OCR-focused API, Mistral OCR. 

It enables agents to securely see inside your organization’s visual data, unlocking the automation of tedious tasks involving slides, diagrams, PDFs, and photos. pic.twitter.com/iHZnUWekrk

— cohere (@cohere) July 31, 2025

Command A Vision outscored the other models in tests such as ChartQA, OCRBench, AI2D and TextVQA. Overall, Command A Vision had an average score of 83.1% compared to GPT 4.1’s 78.6%, Llama 4 Maverick’s 80.5% and the 78.3% from Mistral Medium 3. 

Most large language models (LLMs) these days are multimodal, meaning they can generate or understand visual media like photos or videos. However, enterprises generally use more graphical documents such as charts and PDFs, so extracting information from these unstructured data sources often proves difficult. 

With Deep Research on the rise, the importance of bringing in models capable of reading, analyzing and even downloading unstructured data has grown.

Cohere also said it’s offering Command A Vision in an open weights system, in hopes that enterprises looking to move away from closed or proprietary models will start using its products. So far, there is some interest from developers.

Very impressed at its accuracy extracting hand handwritten notes from an image!

— Adam Sardo (@sardo_adam) July 31, 2025

Finally, an AI that won’t judge my terrible doodles.

— Martha Wisener ? (@martwisener) August 1, 2025

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.





Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleFundamental Research Labs nabs $30M+ to build AI agents across verticals
Next Article Light has two identities that are impossible to see at the same time
Advanced AI Editor
  • Website

Related Posts

Software is 40% of security budgets as CISOs shift to AI defense

August 30, 2025

How Intuit killed the chatbot crutch – and built an agentic AI playbook you can copy

August 29, 2025

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

August 29, 2025

Comments are closed.

Latest Posts

Jackson Pollock Masterpiece Found to Contain Extinct Manganese Blue

Jennifer Packer and Marie Watt Win $250,000 Heinz Award

KAWS Named Uniqlo’s First Artist-in-Residence

Jeffrey Gibson Talks About Animals at Unveiling of New Sculptures at the Met

Latest Posts

The World’s First Car Equipped with Doubao Deep Thinking Large Model: Roewe M7 DMH Creates the ‘Smart Car That Understands Users’_user_car_memory

September 18, 2025

Secrets of Chinese AI Model DeepSeek Revealed in Landmark Paper

September 18, 2025

Huawei bypasses Nvidia AI chips in computing breakthrough for China

September 18, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • The World’s First Car Equipped with Doubao Deep Thinking Large Model: Roewe M7 DMH Creates the ‘Smart Car That Understands Users’_user_car_memory
  • Secrets of Chinese AI Model DeepSeek Revealed in Landmark Paper
  • Huawei bypasses Nvidia AI chips in computing breakthrough for China
  • IBM, BharatGen Partner to Drive AI Adoption in India with Indic LLMs
  • India leads the way on Google’s Nano Banana with a local creative twist

Recent Comments

  1. Timothyglurl on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. CharlesGrave on Foundation AI: Cisco launches AI model for integration in security applications
  3. Sportwetten Deutscher Anbieter on [2410.06415] Biased AI can Influence Political Decision-Making
  4. Wetten Auf Deutschland on Nvidia shares jump 6% after Q1 beat, brushing off China export hit – US News
  5. Peterrep on Koyo Kouoh, curator appointed to lead the 2026 Venice Biennale, dies

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.