Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Optimizing Diffusion Trajectories, Human Evaluation Scores Surge by 300%_The_This_reward

Perplexity’s Comet AI browser goes free for all users with Plus subscription at $5 per month

Perplexity’s AI Comet Browser Is Now Available for Free for Everyone

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Cohere

How to Build a Multilingual Large Language Model

By Advanced AI EditorOctober 3, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


At SlatorCon Silicon Valley 2025, Cohere’s Multilingual Team Lead Kelly Marchisio delivered one of the most well-received presentations of the day: an accessible, behind-the-scenes look at how to build a multilingual large language model (LLM).

Marchisio put delegates in the shoes of a machine learning engineer at the frontier of AI innovation, grappling with questions such as: How do we build the best-performing LLM? How do we bake in multilinguality from the start? And how do we ensure the model interacts in ways that are genuinely useful?

Foundational model builder Cohere released its Command A flagship model in March 2025, followed by a specialized translation model, Command A Translate, in August 2025. The timing was fortuitous, allowing Marchisio to use the SlatorCon stage to lift the lid on the life cycle of the LLM’s development.

LLMs are often built as English-centric systems, and only later retrofitted with stronger multilingual capabilities. Cohere takes a different approach. “Multilinguality is core to what we do. We think about making our models multilingual throughout the entire training process,” Marchisio explained.

The idea is to pre-train the LLM across a range of languages so it can deliver strong multilingual performance in capabilities such as question answering, translation, and summarization.

Multilinguality brings unique challenges, however. The first and most obvious is deciding which languages to include. It’s not yet possible to support all languages in a single model, given constraints in size and data, and practical choices have to be made. Cohere selected 23 languages “to support the variety of languages that are used in global business contexts,” Marchisio said.

Feeding the LLM

Next up was one of AI’s most well-known challenges: obtaining huge amounts of training data. Marchisio’s team created a “training mixture” from public sources, annotator-created data, and synthetic generation. 

Another hurdle for the Cohere team was tokenization. An LLM’s training text must be split into trainable tokens, whether words, characters, or bytes. A tokenizer handles this splitting process, but without optimizing the tool for all target languages, it can create major imbalances.

Sharing an example with the audience, Marchisio showed how the same phrase might be split into 11 tokens in English but 21 in Hindi. The consequences are not trivial. For users, more tokens mean higher costs, if billing is “per token.” For providers, more tokens increase compute time, making the model slower and more expensive to run.

With the tokenizer optimized, pre-training on Command A could begin. In this stage, data is distributed across servers, and attention mechanisms map relationships between tokens. The process is lengthy, stretching over many months.

At the Frontier

While Command A was pre-training, the Cohere team did not stay idle. Instead, they used this as the perfect moment to turn their energies toward open research questions, of which there are many.

“Because we are at the frontier of multilingual AI research, we face unanswered questions daily,” Marchisio pointed out. 

One such question is “language confusion.” Imagine, Marchisio invited the audience, being a Korean user typing a math question in Korean into an LLM, hitting enter, and the answer appears in English.

“This is a real example that I have seen in the wild, and if you’re frequently a user of LLMs outside of English, you have probably come across this type of error,” Marchisio said.

LLMs may show language confusion at the line level, say, switching between Spanish and English for a line or two, or may pepper words from one language into a paragraph written in another. It is, Marchisio noted, “a pretty jarring user experience”.

The result of this exploration was a paper naming the problem, establishing a benchmark to evaluate these types of failures, and pinpointing mitigating techniques.

Efficiency at Work

Cohere, which added USD 100m to its latest funding round in September 2025, focuses on building LLMs for enterprise. The practical realities of deployment are therefore a major focus.

Command A is available via API, but when users need a local or private deployment, hardware becomes a much bigger consideration.

“Given the diversity of our customers worldwide, we observe variations in compute capabilities across different regions, and there is a need for Cohere to be very flexible on efficiency”, Marchisio said.

One way to make models easier to run on less powerful hardware is by simplifying how numbers are stored, in a process called “quantization”, effectively shrinking the model. But, as Marchisio explained, “nothing in life is free, so there is a cost to quantization.”

The team set out to explore what that cost looked like, focusing on how quantization affects quality across languages.

Their results challenged the prevailing view, largely shaped by automated benchmarks, that the effect of quantization is negligible. In fact, they showed that this process does cause quality loss that is noticeable to humans, with non-Latin script languages and complex tasks most affected.

Polishing the Model

With pre-training finally complete, Command A was ready for post-training, the stage where an LLM goes beyond being a mere text predictor and begins to interact in more useful, natural, and human-like ways.

Examples of inputs and outputs were given to the model so it learned to respond usefully. The Cohere team then took a rather unique approach, performing multiple rounds of “expert model” training.

The first round deepened the model’s skills specialization. “We had different teams that focused on different types of skills, coding, multilinguality, safety, instruction following, who tried to build their best ‘expert’ [version of the] model for that skill,” Marchisio said. “Then we merged the results to get a strong all-rounder model.”

In the second round, the same teams each improved the model’s helpfulness by giving feedback on which responses were most useful for their skill focus, then all the resulting models were again merged into one.

A “polishing” stage involved jumping back and forth between a range of training techniques: labeled data, human-ranked answers, and real-time human judgments on usability. This produced several “finished” model versions.

Finally, an assessment to crown the best possible model was carried out, involving “dogfooding”, obtaining feedback from users inside the company on real-world tasks, followed by formal human evaluations. This version became Command A.

“And to bring it full circle, the multilingual team carried out additional refinements on top to create Command A Translate,” Marchisio concluded.

So, what’s next? Marchisio said the cycle continues: training, tackling open problems, and applying new insights, all part of the ongoing work of advancing large language models. She pointed to multimodality, multilingual agents, and language consistency as three key focus areas. “We continue to think about these questions every day,” Marchisio said.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTOI Bharat Abroad: Meet India’s youngest billionaire
Next Article Affordable Tesla Model Y spotted without camouflage near Giga Texas
Advanced AI Editor
  • Website

Related Posts

A Fresh Look at AMD’s Valuation as Intel Foundry Talks and AI Partnerships Fuel Investor Interest

October 3, 2025

Cohere’s new CFO on carving out space in a crowded AI market

October 2, 2025

AMD Expands AI Partnership With Cohere for Enterprise and Sovereign AI

October 1, 2025

Comments are closed.

Latest Posts

New Archaeological Research Reveals Life in Pompeii Post-Eruption

Italian police seize 21 suspected forgeries attributed to Dalí

Acclaimed Sculptor Petrit Halilaj Wins $100,000 Nasher Prize

Syracuse University Starts First Program For Podcasters and Influencers

Latest Posts

Optimizing Diffusion Trajectories, Human Evaluation Scores Surge by 300%_The_This_reward

October 3, 2025

Perplexity’s Comet AI browser goes free for all users with Plus subscription at $5 per month

October 3, 2025

Perplexity’s AI Comet Browser Is Now Available for Free for Everyone

October 3, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Optimizing Diffusion Trajectories, Human Evaluation Scores Surge by 300%_The_This_reward
  • Perplexity’s Comet AI browser goes free for all users with Plus subscription at $5 per month
  • Perplexity’s AI Comet Browser Is Now Available for Free for Everyone
  • Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective – Takara TLDR
  • OpenAI’s Sora Hits No. 1 Spot on Apple App Store

Recent Comments

  1. Tyler Schepens on Nuclear power investment is growing. These stocks offer exposure
  2. Tommyneats on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. TerryWep on Stanford HAI’s annual report highlights rapid adoption and growing accessibility of powerful AI systems
  4. zestycrow4Nalay on Ballet Tech Forms The Future Through Dance
  5. fuzzypanda7Nalay on Ballet Tech Forms The Future Through Dance

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.