Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

If Tech M&A “Floodgates” Open, A Deal Makes Sense For C3 AI

QeRL: Beyond Efficiency — Quantization-enhanced Reinforcement Learning for LLMs – Takara TLDR

Build a device management agent with Amazon Bedrock AgentCore

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Andrej Karpathy

Get Exclusive ChatGPT in 4 Hours: Karpathy’s Old Tricks, 8,000 Lines of Hand

By Advanced AI EditorOctober 14, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


“This is one of the most insane pieces of work I’ve ever written.” Just now, Andrej Karpathy, the former AI director of Tesla and a founding member of OpenAI, released his latest open – source project, a repository named nanochat. As of now, the project has exceeded 7.9k Stars on GitHub!

GitHub repository: https://github.com/karpathy/nanochat

It is reported that, different from Karpathy’s previous similar repository nanoGPT which only included pre – training functions, nanochat is a minimalist, end – to – end training/inference toolchain built from scratch. It can be used to build a simplified ChatGPT replication model, and the entire codebase consists of only one file with very few dependencies.

A model trained in half a day with $100 beats GPT – 2

“The best ChatGPT you can get for $100,” Karpathy described nanochat in the announcement. With nanochat, you just need to launch a cloud GPU server and run a script. After as fast as 4 hours, you can have a conversation with your own trained large language model (LLM) on a ChatGPT – like web interface.

Specifically, the project can achieve the following functions:

Train a tokenizer based on the new Rust language implementation version
Pre – train a Transformer – architecture large language model on the FineWeb dataset and evaluate the CORE score through multiple metrics
Conduct mid – training (Midtrain) on the SmolTalk user – assistant dialogue dataset, multiple – choice question dataset, and tool – using dataset
Perform instruction fine – tuning (SFT) on the chat model and evaluate the model’s performance on world knowledge multiple – choice questions (ARC – E/C, MMLU), math problems (GSM8K), and code tasks (HumanEval)
Optionally, train the model through reinforcement learning (RL) on the GSM8K dataset using the “GRPO” algorithm
Implement efficient inference in an inference engine with KV cache, support simple pre – fill/decoding processes and tool use (a Python interpreter in a lightweight sandbox), and interact with the model through the command – line interface (CLI) or a ChatGPT – like web interface (WebUI)
Automatically generate a Markdown – formatted “report card” to summarize the entire project process and present various metrics in a “gamified” way

According to Karpathy, even with a low cost of about $100 (training for about 4 hours on an 8 – card H100 node), you can use nanochat to train a conversational simplified ChatGPT replication model. It can write stories, poems, and answer simple questions. After about 12 hours of training, the model’s performance can exceed the CORE metrics of GPT – 2.

On Github, Karpathy explained the detailed process of “rapidly training” the optimal ChatGPT model with $100.

Detailed technical steps: https://github.com/karpathy/nanochat/discussions/1

If the cost is further increased to about $1000 (training for about 41.6 hours), the model’s coherence will be significantly improved. It can solve simple math problems, code tasks, and complete multiple – choice question tests. For example, after training a model with a depth of 30 for 24 hours (its computational complexity FLOPs is equivalent to that of GPT – 3 Small (1.25 billion parameters), only 1/1000 of GPT – 3), it can score more than 40 points on the MMLU dataset, more than 70 points on the ARC – Easy dataset, and more than 20 points on the GSM8K dataset.

Karpathy’s goal is to integrate this complete “strong benchmark” toolchain into a logically coherent, minimalist, readable, highly modifiable, and forkable code repository. “nanochat will be the core project of the LLM101n course (still under development). I think it also has the potential to develop into a research tool framework or a benchmarking tool, just like the previous nanoGPT.”

He revealed that currently, this project is by no means the final version. It has neither been fully tuned nor optimized for performance. However, its overall framework is well – established enough to be released on GitHub, and all subsequent modules can be further improved in the community. Moreover, Karpathy said that in fact, there are still many easily achievable optimization points in nanochat.

8000 lines of code written by hand, “Agents are of no help”

The entire project has only about 8000 lines of code in total, but Karpathy emphasized that “the code structure is quite clear.” Moreover, this code repository is basically entirely written by Karpathy himself, only using the Tab key for auto – completion.

“I’ve tried using Claude or Codex Agents to assist several times before, but the results were extremely poor. In the end, they were of no help. Maybe it’s because the code style and functions of this repository deviate too much from the regular code in the training data of these tools,” Karpathy said.

When talking about the model architecture of nanochat, Karpathy introduced that it is generally similar to the Llama model, with a simpler structure, and also borrows some design ideas from modded – nanoGPT (an improved version of nanoGPT).

He tried to determine a reliable benchmark architecture for a model of this scale, as follows:

Dense Transformer (without sparse structure)
Rotary Embeddings for position encoding, without using other position encodings
QK Norm (normalizing the query vector Q and the key vector K)
The weights of the embedding layer and the unembedding layer are not shared
Normalize the results of token embedding
Use the relu² activation function in the multi – layer perceptron (MLP)
The Root Mean Square Normalization (RMSNorm) does not contain learnable parameters
No biases are used in the linear layers
Multi – Query Attention (MQA)
Logit softcap (limiting the logit value range to stabilize training)

The optimizer of nanochat uses a combination of Muon + AdamW, which is largely inspired by modded – nanoGPT. It is reported that currently, Karpathy has a to – do item: try to remove the dependence on Muon by optimizing the learning rate of Adam (for example, setting exclusive learning rates for different modules), but he hasn’t devoted enough energy to it yet.

Netizens: Get a machine and the title of machine learning engineer

In addition to Github, the newly released nanochat has also gained a lot of popularity on social platforms.

“I’ve always loved the Nano series of projects! This minimalist end – to – end training/inference toolchain will definitely have a profound impact on many machine learning learners and researchers,” a netizen said.

Another netizen also said, “For me personally, this code repository is an excellent learning material for the future – it’s very helpful whether for understanding the low – level deep learning implementation based on Rust or (more fundamentally) Python deep learning development.” At the same time, he pointed out, “If everyone can now train their own large language models (LLMs) with minimal effort using this repository, wouldn’t the technological advantages of companies like Anthropic and OpenAI be weakened? After all, there are many excellent engineers in the market, and as long as they have sufficient resources, they are quite likely to train more powerful large language models.”

Someone else pointed out, “I think the largest audience for this code repository is researchers. Many people may have ideas for improving large language models (LLMs), but turning these ideas into a complete implementation not only requires a lot of effort, but the final results are also full of uncertainties. Now, we have such a ready – made tool process that we can directly use for experiments. What used to be just a daydream of ‘what if we could do this?’ has now become a practical action of ‘I can try to implement this idea next weekend’.”

Some netizens even joked, “After running this, I’ll definitely add the title of ‘Machine Learning Engineer’ to my resume.”

Reference links:

https://x.com/karpathy/status/1977755427569111362

https://github.com/karpathy/nanochat

This article is from the WeChat official account “AI Frontline”, compiled by Hua Wei, and published by 36Kr with authorization.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleKustomer Transitions to AI-Native Customer Experience Platform
Next Article SpaceX aces Starship’s 11th launch with success in every mission objective
Advanced AI Editor
  • Website

Related Posts

Andrej Karpathy Releases NanoChat: The Minimal ChatGPT Clone

October 14, 2025

Andrej Karpathy Launches ‘nanochat’, An Open-Source ChatGPT-Style Model Training Pipeline

October 14, 2025

Andrej Karpathy Releases nanochat, a Minimal ChatGPT Clone

October 13, 2025

Comments are closed.

Latest Posts

Egyptian Archaeologists Discover Large New Kingdom Military Fortress

Joan Weinstein to Head Vice President for Getty-Wide Program Planning

India Plots First Venice Biennale Pavilion in Seven Years

Massive Moai Statues Once ‘Walked’ to Their Platforms on Easter Island

Latest Posts

If Tech M&A “Floodgates” Open, A Deal Makes Sense For C3 AI

October 14, 2025

QeRL: Beyond Efficiency — Quantization-enhanced Reinforcement Learning for LLMs – Takara TLDR

October 14, 2025

Build a device management agent with Amazon Bedrock AgentCore

October 14, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • If Tech M&A “Floodgates” Open, A Deal Makes Sense For C3 AI
  • QeRL: Beyond Efficiency — Quantization-enhanced Reinforcement Learning for LLMs – Takara TLDR
  • Build a device management agent with Amazon Bedrock AgentCore
  • Foreign policy ban impacts education entities, state agencies
  • US approves Nvidia AI chip exports to UAE

Recent Comments

  1. Clay on A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models
  2. Frank Birkenmeier on Mistral AI releases enterprise coding tool
  3. harrykane on [2301.09757] The Packing Chromatic Number of the Infinite Square Grid is 15
  4. Benny Venneman on Mistral AI releases enterprise coding tool
  5. Liam on Stanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.