Google DeepMind published a paper describing their AlphaEvolve coding agent. AlphaEvolve uses LLMs to discover and optimize algorithms across a range of domains, including hardware design, data center operations, and AI training.
AlphaEvolve uses an ensemble of LLMs, including both Gemini Flash and Gemini Pro, to generate and evolve programs that solve a user-defined problem; specifically, the user must specify an evaluation function that returns a set of scalar metrics. Google has applied AlphaEvolve to several problems in mathematics, engineering, and computer science with good results. One example is that AlphaEvolve discovered a more efficient algorithm for multiplying 4×4 matrices. Google also applied it to more than 50 problems in mathematics; AlphaEvolve re-discovered the state-of-the-art solution for 75% and found even better solutions for 20% of them. According to Google,
While AlphaEvolve is currently being applied across math and computing, its general nature means it can be applied to any problem whose solution can be described as an algorithm, and automatically verified. We believe AlphaEvolve could be transformative across many more areas such as material science, drug discovery, sustainability and wider technological and business applications.
The key idea in AlphaEvolve is to use LLMs for generating and evolving code. The system maintains a database of candidate programs that it has generated, and it uses them as context input to an LLM, along with prompts describing how to evolve the programs. Generated programs which have good results from the evaluation are stored in the database, and this loop continues until the best solution is found.
AlphaEvolve Architecture. Image Source: AlphaEvolve Whitepaper
Besides solving mathematical problems, Google has used AlphaEvolve to improve its own datacenter operations. It developed a new heuristic function used by Google’s Borg task orchestrator. AlphaEvolve’s solution was better than one discovered by deep reinforcement learning, and Google was able to recover 0.7% of its worldwide compute resources. AlphaEvolve also improved kernel tiling and FlashAttention operations in its AI training processes, resulting in 23% and 32% speedup respectively.
Users in a Hacker News discussion thread were generally positive about AlphaEvolve and brought up Google’s recent AI track record:
People often forget that Google was behind Mu Zero, which IMO is the most important AI paper of the decade, not the Transformer one, because they effectively showed how models can learn how to search.
Writing on X, Simon Frieder, an AI researcher at the University of Oxford, chided DeepMind for its pattern of not fully open-sourcing their code:
DeepMind, even though they make sure all their releases are interesting scientifically, has a slightly spotty history of releasing full public code. For example, AlphaFold2 was released, but without training scripts. AlphaGeometry turned out to contain bugs. In both cases, open-source replacements were devised: OpenFold in the first case, and Newclid in the second…Because of this history, it could be that hidden bugs may be contained in AlphaEvolve, which do not make me trust the results it gives. In some cases, it will probably be easy to verify that the result it outputs is correct, but not in all cases. Note that this is different from LLM hallucinations, as here we have an automatic evaluator on which AlphaEvolve relies.
Although the model is not publicly available, academic researchers can apply for early access to AlphaEvolve.