Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
Experts Aim to Probe How AI Models Reason, and Why It Matters

As an artificial intelligence model narrates what it claims is its own thoughts, it’s tempting to believe that we finally know what it is thinking. But researchers from AI enterprise giants that make these models caution that this glimpse into machine reasoning could be fleeting, and that much more needs to be understood before labelling it as true transparency.
See Also: OnDemand Webinar | Trends, Threats and Expert Takeaways: 2025 Global IR Report Insights
A coalition of scientists from OpenAI, Google DeepMind and Anthropic called for a systematic investigation into monitoring the so-called chains-of-thought, or CoTs, that underpin modern AI reasoning models. These models include OpenAI’s o3 and DeepSeek’s R1, which are designed to tackle complex tasks by breaking them down step by step, similar to how a human might jot notes to solve a problem.
The researchers in their paper described CoT monitoring as an additional safety measure for frontier AI, offering an unusual window into how AI agents make decisions. They also warned that the current level of visibility into these processes could diminish over time. The authors encouraged the research community and AI developers to make use of “CoT monitorability” while it exists and to study how it might be preserved as models advance.
Chains-of-thought have become a central feature of reasoning models, which are increasingly integral to the ambitions of companies building AI agents. By revealing the intermediate steps a model uses to produce an answer, CoT monitoring offers a potential means to assess whether a model is reasoning safely or drifting into unintended behavior. But it’s unclear what makes this transparency robust and what might undermine it, researchers said (see: A Peek Into How AI ‘Thinks’ – and Why It Hallucinates).
The paper asked developers to explore what factors influence CoT monitorability, including whether interventions, architecture changes or optimization techniques could reduce transparency or reliability. The authors warned that CoT monitoring might be fragile and advised against changes that could degrade the clarity of a model’s reasoning process.
Among those endorsing the call to action were OpenAI Chief Research Officer Mark Chen, Safe Superintelligence CEO Ilya Sutskever, Nobel laureate Geoffrey Hinton, Google DeepMind co-founder Shane Legg, xAI safety adviser Dan Hendrycks and Thinking Machines co-founder John Schulman. The first authors include contributors from the U.K. AI Safety Institute and Apollo Research, with additional signatures from researchers affiliated with Amazon, Meta and UC Berkeley.
The position paper comes at a time when leading labs are racing to outdo each other in building more capable AI agents, or models that can plan, reason and act autonomously across tasks. In September, OpenAI previewed its first AI reasoning model, o1. In the following months, Google, DeepMind, xAI and Anthropic introduced competitors that demonstrated similar or superior performance on several benchmarks.
But the rapid improvements in performance have not necessarily translated into a deeper understanding of how these systems arrive at their conclusions, the paper authors said.
Anthropic has particularly invested heavily in interpretability research. Earlier this year, CEO Dario Amodei announced a commitment to crack open the black box of AI models within the next few years and said the company would expand funding and research into interpretability. He also called on OpenAI and Google DeepMind to increase their efforts in the same area.
Earlier findings from Anthropic suggest that CoTs may not always be a fully reliable reflection of how models reach their answers. The position paper says that chains-of-thought could be influenced by prompting methods or external factors, potentially creating a misleading impression of transparency. OpenAI researchers have said that with further study, CoT monitoring could eventually serve as a practical way to track alignment and safety (see: AI Hijacked: New Jailbreak Exploits Chain-of-Thought).
There is fierce competition in the industry at the moment to recruit researchers capable of advancing AI reasoning models. Meta has reportedly been offering compensation packages in the million-dollar range to lure talent away from Anthropic, OpenAI and Google DeepMind. Many of the most sought-after researchers are those specializing in the systems that the paper seeks to make more transparent.
The stakes are high for companies signing the paper. As AI agents become more capable, the pressure to show that they behave predictably and safely will likely intensify, and without clear methods to monitor their reasoning, assurances about safety could remain just empty words.
The authors described their publication as an effort to raise awareness and draw more attention to research on CoT monitoring. They wrote that the purpose of the paper is to signal-boost the topic and encourage the field to prioritize it, while acknowledging that further work is essential.