
(Stock-Asso/Shutterstock)
In October, Stanford will host a new conference that puts AI at the center of science and engineering. Called Agents4Science, the event requires that every paper be generated by AI, reviewed first by AI systems, and even presented by AI using synthetic voices.
It’s been described as the first conference where artificial intelligence must serve as both author and reviewer, a format that challenges long-standing norms in academic publishing. The aim is not only to showcase what AI can produce across disciplines but also to examine its shortcomings in a transparent setting.
The conference is the brainchild of James Zou, a computer scientist at Stanford who studies how humans and machines can collaborate in research.His recent projects have explored whether large-scale automation can help accelerate discovery, including a “Virtual Lab” where AI agents proposed and tested potential treatments for emerging COVID-19 strains. With Agents4Science, Zou is extending that approach. His goal is to use AI tools that can assist researchers by taking on more of the process themselves, from idea to output.
Zou’s interest in scientific collaboration began during his PhD at Harvard, when he stepped away from computer science to spend a year in a genomics lab. That experience highlighted how difficult it can be for researchers from different fields to communicate. He later became convinced that large language models(LLMs) might be better at bridging those gaps.

(Shutterstock)
This idea led him to create the Virtual Lab. As the system began producing publishable results, Zou ran into a roadblock. Even when AI systems had generated key ideas, designed experiments, and written paper drafts, there was no formal way to acknowledge their contribution. Most journals rejected the idea of naming AI as an author, regardless of its role. Some conferences accepted AI-assisted research but insisted that only people could claim authorship.
That resistance from publishers helped shape the idea for Agents4Science. Zou began planning the event earlier this year, speaking with researchers in different disciplines and figuring out how to design a conference where AI would not just assist with science but take the lead. The idea quickly drew attention from others who had faced similar questions about how to get more assistance from AI and share credit when machines are doing more of the work.
That interest helped shape the structure of the event itself. Every paper submitted to Agents4Science must be written primarily by an AI system, with humans allowed to contribute only in supporting roles. Submissions are required to include a clear explanation of how the AI worked, what tools it used, and how key decisions were made.
The organizers are casting a wide net. Submissions are welcome from any field where AI can advance scientific discovery, including biology, chemistry, physics, engineering, and computer science.
Reviewers will also be AI systems, with each paper evaluated independently by multiple LLMs to reduce bias and provide a range of perspectives. These reviews will follow the standard NeurIPS conference template, scoring papers on originality, clarity, and significance. After this AI-led first round, a panel of human experts will evaluate the highest-ranked submissions. All reviews, AI prompts, and author disclosures will be made public, offering researchers a transparent view into how machine-led peer review unfolds.
For human researchers, that openness could prove just as valuable as the findings themselves. By making the full pipeline visible, from how the AI formed its conclusions to how other models judged its output, the conference offers a rare look into how machine reasoning actually unfolds. It gives scientists a way to evaluate the process as well as the results and to begin building standards for how AI should be used in scientific work.

(mechichi/Shutterstock)
Some scientists are still unsure about handing the lab coat over to machines. They point out that AI models, even the best ones, can still fumble facts, miss context, or come up with answers that sound convincing but fall apart under close inspection. It’s not just about mistakes.
There’s a deeper worry that machines, no matter how fast or well-read, might not be able to reason through problems the way a trained human can. When the stakes are high, like in medicine or climate research, that kind of judgment matters.
There is also the question of what this all means for early-career researchers. If AI starts doing more of the heavy lifting, where does that leave the people who are still learning the ropes? Some argue that science is not just about results. It is about the process and the long hours of figuring things out. That struggle helps build judgment and expertise. Without it, researchers may miss out on developing the kind of judgment and skills their predecessors gained through experience.
Whether AI is ready to take the lead in science remains an open question. Some see it as a path to faster and broader discovery, while others worry about what might be lost along the way. What is clear is that science and engineering research is no longer a purely human pursuit. As machines move from tool to teammate, the challenge will be to ensure they elevate the work rather than replace the people behind it. The future of science may depend on how that balance is struck.