Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Trump’s Tech Sanctions To Empower China, Betray America

Paper page – MARBLE: Material Recomposition and Blending in CLIP-Space

Class Dismissed? Representative Claims in Getty v. Stability AI | Cooley LLP

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » What Is an LLM and How Does It Work?
Andrej Karpathy

What Is an LLM and How Does It Work?

Advanced AI BotBy Advanced AI BotApril 18, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


The world is being taken by storm by the apparently amazing “superpowers” of the latest generation of large language models (LLMs). Whether you’ve used DeepSeek, Gemini, Perplexity, or Claude, you’ve almost certainly wondered, “How did they do that?”

Artificial intelligence guru Andrej Karpathy has produced one of the best tech videos I’ve ever watched. It’s not for the faint of heart, but in 3.5 hours, he leads anyone with a basic understanding of neural networks to a similar knowledge of how modern LLMs, “chat-based” LLMs, and “reasoning” LLMs are constructed:

The video above contains a lot to unpack, but we’ll provide a walkthrough of some of the major concepts in this article.

Transformer Networks

Almost a decade ago, so-called transformer networks started to appear. The idea is that if you train a neural network on many sequences of characters or words (more technically, those are turned into tokens), it can begin to predict the following character or word.

OpenAI’s GPT-2, released in 2019, was an early and famous example of a transformer network. As an experiment that year, I trained it on a few thousand ET articles to see how well it could generate them on its own.

Since then, GPUs have become much faster, and models have become much larger. So subsequent generations of traditional transformer models can be trained on much larger sources of data—and rely on longer inputs to generate better and more types of results.

Cost curve for training GPT-2

Cost curve for training GPT-2

Advances in computer performance have taken the cost of training GPT-2 down from $40,000 to less than $1,000, and currently as low as $100. Credit: Andrej Karpathy

The leap from a model like GPT-2 to a modern system like ChatGPT builds on the basic transformer model with additional layers and types of training, which has led to extraordinary results.

Now, the traditional transformer training is called pre-training. For a large LLM, it includes crawling much of the internet and using the sequences of words (actually tokens) that it finds there to make a model of what word (token) is most likely to follow a sequence of input tokens.

Given enough time, money, and electricity, the result is a powerful model that can provide plausible answers to some questions by calculating the most-like subsequent words.

At this point, the model is really good at regurgitating what it read on the internet, but it doesn’t (yet) know how to go beyond that. Next are what can go wrong—and the technologies that can turn it into a state-of-the-art LLM.

Preventing Hallucinations

We’ve all seen instances where an LLM makes up facts out of thin air. This was especially true of “traditional” LLMs that relied exclusively on a transformer model. Since the LLM doesn’t really “know” everything, it makes sense that it will invent answers.

More recent models have a strategy for minimizing hallucinations. Basically, the model is asked a slate of questions multiple times. If it provides different answers each time, it is “told” that it doesn’t know those answers. It then manages to “learn” the types of questions it can’t answer from memory.

This is where tools can come in. The data models are trained on is not only finite, but has a specific cutoff date. So using another strategy, typically called a tool, can help the model get additional or better results. Early versions of systems like ChatGPT were limited to the information they were trained on. But current versions know how to search the web when they determine they need additional information. Sometimes that works, but if the query stumps the web, then we can still get a nonsensical answer.

Self-Awareness—Or Not

Models are frequently asked questions along the lines of “Who made you?”, which can lead to suspicious results. For example, at one point, DeepSeek was “outed” for saying it was trained on OpenAI. However, what is actually happening is that the model is looking for the most common answer on the web—which is OpenAI—and returns it.

One powerful way to pre-program models to teach them to avoid these mistakes is to feed them base context information. As Karpathy so elegantly puts it, the model itself is similar to our total memory, while its context is more like our current working memory. So a model might have some question-and-answer strings preloaded in its context that include basic facts about the model.

‘Chat’ Training: Learning How to Converse

It’d be pretty easy to miss the technology transition from, say, GPT-2 to ChatGPT. Since models seem to get named pretty randomly, and features are often poorly described, the large leap in how they act is a little hard to find.

The big advance is the next round of training for the models—one that provides “conversational” input to them, such as questions asked by a user and ideal answers provided in response. This training provides another layer on top of the “simplistic” word generator.

Human Labeling

Currently, training a model to decide what’s useful conversation takes a large amount of human labeling. Not the billions of words needed to pretrain the model, but enough that companies are making a living providing and automating this service.

Human Labeling

Human Labeling

This is a tiny excerpt of the hundreds of pages a human labeler might be given to help them author conversations that are useful for training an LLM. Credit: Andrej Karpathy

Why Models Can Be Stupid

It’s common to wonder how models can solve complex problems but get simple questions related to which number is larger or whether water freezes at 0° C wrong.

The key to understanding that is to realize that LLMs see the world as a series of tokens. They don’t actually have an intuitive understanding of numbers as mathematical constructs.

One illustrative example that Karpathy cites in his video is an older model being asked whether 9.9 is larger than 9.11. That’s trivial for us, but some models stated that 9.11 was larger. A paper analyzing the issue determined that because the Bible verse 9.11 followed 9.9, those models determined it to be a larger number.

This is where another clver tool comes in. “Use code” tells a model that has been taught to write and run code to essentially “show its work” by writing a program. For math problems, for example, the program often gets the correct answer even when the equivalent simple query might fool it. For example, here is the result when we ask ChatGPT to use code to answer the question which is greater:

a = 9.9
b = 9.11

if a > b:
print(f”{a} is greater than {b}”)
elif a print(f”{b} is greater than {a}”)
else:
print(f”{a} and {b} are equal”)

It then runs it for us and provides the output:
9.9 is greater than 9.11

Another reason is that the initial data input to the model is a bit like our own memory: It can be hazy. That’s a major reason that a user-provided prompt or “context” provided to the model can generate much more useful responses.

‘Thinking’ Models: Reinforcement Learning

All of the above steps create some excellent models. But they are trapped in their immediate analysis of a problem. Borrowing a page from successful reinforcement learning models like Alpha Go, creators of LLMs have begun to allow them to improve their own results by doing multiple trials of their answers and evaluating them.

RL Example

RL Example

LLMs do reinforcement learning essentially by doing the same sort of practice problems a student would in a textbook over and over again while improving their answers. Credit: Karpathy

This technique is already being used in some proprietary models from OpenAI and others. But DeepSeek blew it wide open by making it publicly available and publishing a paper on how it works in their R1 model (which you can now find hosted on many sites and available for download).

Learning How to Respond to Subjective Questions

Reinforcement Learning (RL) has proven to be an impressive approach for solving problems with empirical solutions—like AlphaGo learning to beat the world’s best human player by playing against itself. But that approach isn’t very useful for subjective queries like “tell me a joke” or “write me a poem.” Those require human judgment.

The naive approach to training a model on queries that require creativity would be to have humans create great jokes, poems, and so on, and feed them to the model. Unfortunately, people are not generally great creators of jokes and poems.

But people are much better at judging the quality of a poem or a joke than they are at creating one. So, the human-based RL trained on subjective topics relies on many humans scoring the quality of jokes, poems, and other common subjective responses.

Science That Looks Like Magic

Even knowing something about how modern LLMs are built, trained, and run, the output I can get from them often still seems magical. As a result, it is tempting to conclude that they have some kind of superpower that comes from their massive neural networks. Whatever you think about that, hopefully you now at least have an understanding of what goes on underneath the surface, or “behind the curtain,” if you prefer.

Thanks to Phil Z. for getting me inspired to watch Andrej’s video and write this article.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleStanford HAI’s annual report highlights rapid adoption and growing accessibility of powerful AI systems
Next Article Trade war puts US on ‘precipice’ of recession: Moody’s chief economist
Advanced AI Bot
  • Website

Related Posts

Veo 3 Changes How Videos are Made and Consumed, says Andrej Karpathy

June 3, 2025

ChatGPT User? Former OpenAI Researcher Andrej Karpathy Breaks Down Which Version Works Best For Each Task – Alphabet (NASDAQ:GOOG), Alphabet (NASDAQ:GOOGL)

June 3, 2025

ChatGPT User? Former OpenAI Researcher Andrej Karpathy Breaks Down Which Version Works Best For Each Task – Alphabet (NASDAQ:GOOG), Alphabet (NASDAQ:GOOGL)

June 3, 2025
Leave A Reply Cancel Reply

Latest Posts

Men’s Swimwear Gets Casual At Miami Swim Week 2025

Original Prototype for Jane Birkin’s Hermes Bag Consigned to Sotheby’s

Viral Trump Vs. Musk Feud Ignites A Meme Chain Reaction

UK Art Dealer Sentenced To 2.5 Years In Jail For Selling Art to Suspected Hezbollah Financier

Latest Posts

Trump’s Tech Sanctions To Empower China, Betray America

June 7, 2025

Paper page – MARBLE: Material Recomposition and Blending in CLIP-Space

June 7, 2025

Class Dismissed? Representative Claims in Getty v. Stability AI | Cooley LLP

June 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.