
FILE PHOTO: Renowned AI researcher and former OpenAI scientist Andrej Karpathy has said that he’s “bearish on reinforcement learning” for the long-term, in a post on X.
| Photo Credit: Reuters
amed AI researcher and former OpenAI scientist Andrej Karpathy, in a X post, said that he’s “bearish on reinforcement learning” in the long-term as it will turn out to be inefficient and hard to design. Mr. Karpathy, who was one of OpenAI’s founding member and worked on the GPT-4 model, said he believes new learning methods, similar to how humans think, will eventually replace reinforcement learning.
“Personally, and in the long-term, I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically,” he said. He expressed his doubts that humans used reinforcement learning for most intellectual tasks except for “some motor tasks.”
“Humans use different learning paradigms that are significantly more powerful and sample efficient and that haven’t been properly invented and scaled yet, though early sketches and ideas exist,” he added.
As the current flagship large language models’ progress slows, there has been a resurgence in reinforcement learning methods — a machine learning training technique that is used to build AI models.
Mr. Karpathy noted that past AI training techniques like reading text and imitating examples will continue to exist, but the future will be in letting models live in environments and learn by interacting with each other.
READ MORE
Published – August 29, 2025 02:09 pm IST