The Unlikely Reasonableness Of AI-Augmented HPC

Editor’s note: This article from my colleague Doug Eadline at HPCwire explains how HPC researchers are pairing established simulation codes with large AI models to slash runtimes and widen scientific reach in protein folding, weather prediction, and molecular design. Anchoring this work are the same pillars that matter to every AI practitioner: clean data, rigorous validation, and powerful compute. It’s becoming clear that the next advance in applied AI will lean on HPC infrastructure and on the culture of reproducibility that has guided supercomputing for decades. New groups such as the Trillion Parameter Consortium invite AI specialists, laboratory scientists, and industry engineers to collaborate on models that respect physics and scale to the largest machines. – Jaime Hampton, Managing Editor, AIwire.

For many HPC practitioners, the game is played as follows. In general, science develops models of the world using differential equations. These equations can be solved or estimated to produce a picture of how the model changes with time, for example, in weather prediction. In quantum mechanics, integral calculus is used to predict various energy levels of atoms and molecules.

Central to all of these methods is a model based on theory or first principles (fundamental physical rules) that reflect how nature behaves. The final arbiter is, of course, nature, and the models, depending on various factors, provide different levels of accuracy. Some models are remarkably good and often require large amounts of computing time to traverse all the mathematics.

Kinematic quantities of a classical particle of mass m: position r, velocity v, acceleration a. (Source: Wikipedia)

High performance computing has progressed in this manner since its inception. With the introduction of the portable FORTRAN programming standard, developers could focus on developing and improving their computational model instead of programming for various machine nuances and differences. Collectively, these models are known as “ModSim” (Models and Simulation) and continue to drive the HPC market to bigger and faster machines.

Various supercomputing designs have been developed to run ModSim codes. From the first vector processors to parallel clusters and massively parallel GPUs, HPC is known to leverage any available hardware or software to increase model size and/or performance.

The advent of large-scale AI modeling has altered this tried-and-true HPC computation formula. Large AI models can be trained on ModSim data to produce “data models” that accurately solve traditional mathematical models in far less time, without requiring the solution of the underlying physics.

This unlikely conclusion is both remarkable and, in a sense, blasphemous in the eyes of traditional HPC practitioners. How can processes bounded by the laws of physics be modeled with “just data”?

The Grammar of Physics

Putting aside the quest for AGI (Artificial General Intelligence), consider how current GenAI Large Language Models (LLMs) operate. By sampling large amounts of text data, they learn statistical relationships between tokens (words) in the English language. (The analysis works for other languages as well, and most models are based on scraping English language content from the internet.) As is commonly known, LLMs use these relationships to complete sentences, paragraphs, and even books based on user prompts. For instance, an LLM might develop the sentence:

Bring an umbrella because tomorrow it will (blank)

There is a high probability that, based on the learned model, the next word will be “rain,” or “drizzle”, or “storm,” or some other word or phrase that is related to the word rain. The choice depends on the temperature setting of the LLM; a low temperature means choosing the most probable, and a high temperature means randomly choosing one of the candidate words. A low temperature also means the answer to the same prompt will be almost identical, while a high temperature will provide different responses. If set too high, completely random responses will result. Temperature settings can influence hallucinations (i.e., wrong words or phrases) in LLMs.

The effectiveness of LLMs is due to their ability ot recognize a relationship structure in the English language. There is a certain grammar or structure without which language would be impossible. The grammar of language is flexible and provides multiple combinations of ways to say the same thing, which is why the temperature in LLMs is an effective way to make responses more human-sounding. (e.g., Understand we can even Yoda from Star Wars.)

One area of language that has a more restricted grammar is computer software. Programming languages have very specific structures and are limited to a set of basic words or operations. Like language, they still allow many different paths to the same result, but unlike the response to a typical LLM prompt, computer programs can be automatically checked for accuracy, and wrong results are easily filtered out.

The sciences, including physics, chemistry, and biology, also have a structure or grammar that is ultimately determined by scientific laws, for instance, Newton’s Laws of Motion or Schrödinger’s Equation in Quantum Mechanics. The grammar imposed by the mathematics underlying scientific models is often stricter than that of human language.

Even the study of chaos (e.g., fluid flow) has a grammar or structure associated with it. Chaotic systems were once considered intractable, characterized by random states of disorder. There are, however, underlying patterns, interconnections, constant feedback loops, repetition, self-similarity, fractals, and self-organization at play in chaotic behavior.

Adherence to physical laws is what provides a grammar for relationships in physical systems. Using AI training, this grammar shapes relationships between aspects of a physical system, all of which can be learned by the model. Because these models are numeric rather than text-based, they are often referred to as Large Quantitative Models (LQMs). This learning is similar to how an LLM will define a word by its relationship to other words in the text corpus.

The Proof is in the Computing

Perhaps the biggest success to date has been the results of Alphabet’s (Google’s) DeepMind AlphaFold, which utilized AI to determine how proteins fold based on the initial peptide chain (as defined by a cell’s DNA sequence). Computing the possible protein configurations using traditional ModSim methods was (and still is) considered a computationally difficult problem due to the extremely large number of possible combinations (types of folding). AlphaFold was trained on existing protein data and limits the search by eliminating unlikely structures; it has become the de facto method for determining protein structures (or at least eliminating unlikely structures). The authors of AlphaFold, Demis Hassabis and John Jumper of Google DeepMind, shared one half of the 2024 Nobel Prize in Chemistry, awarded “for protein structure prediction.” A similar open-source tool, OpenFold, is also available to the scientific community, which uses the same AI-augmented approach to accelerate ModSim calculations.

There are many other examples of AI augmented HPC beyond protein folding. As described in the HPCwire article, Aurora AI-Driven Atmosphere Model is 5,000x Faster Than Traditional System and according to Microsoft, the developer of the Aurora model (not to be confused with the Argonne Aurora supercomputer) trained on previous weather data (calculated and measured) and produced predictions about 5,000 times faster than the numerical Integrated Forecasting System. The accuracy (as compared to other ModSim results and actual weather) of the Aurora data model is equal to or better than traditional numerical models. It is “tunable” by increasing the data set diversity and the model size.

Recently, Berkeley Lab, in conjunction with Meta, released Open Molecules 25 (OMol25) and Universal Model for Atoms (UMA) for public use. Open Molecules is a collection of more than 100 million 3D molecular snapshots whose properties have been calculated using density functional theory (DFT). DFT is an incredibly powerful (and computationally expensive) tool for modeling precise details of atomic interactions, allowing scientists to predict the force on each atom and the energy of the system, which in turn dictate the molecular motion and chemical reactions that determine larger-scale properties, such as how the electrolyte reacts in a battery or how a drug binds to a receptor to prevent disease. Using traditional Molecular Dynamics Simulations (DFT) results to train machine learning models can provide molecular predictions of the same caliber, but 10,000 times faster than the traditional DFT Molecular Dynamics Simulation numerical approach.

How do We Know Our Answer is Correct?

Being skeptical of AI makes sense. Keep in mind that the term “AI“ encompasses a wide range of methodologies and does not have a strict definition per se. Different incarnations of AI methods may utilize technology that enables computers to simulate human learning, comprehension, problem-solving, decision-making, creativity, and autonomy. AI applications can range from basic statistical supervised learning models to massive LLMs provided by companies like OpenAI, Google, Meta, and others.

The bigger models and claims of AGI are under constant scrutiny. Whether it is playing chess (poorly) due to the lack of a “world view” or not being able to solve the classic AI Puzzle called the Towers of Hanoi beyond memorized solutions, the latest and greatest LLMs still have some soft spots. In addition, LLM hallucinations are not without consequence, as indicated by the growth of imaginary legal hallucinations submitted as part of court documents (someone is not checking their work).

These concerns are valid for any form of AI and include issues such as over- or underfitting data, feature generation, data provenance, and more. The key difference between LLM and scientific models is the reliance on a physics grammar vs a language grammar. As good scientists, computational results always need to be verified against the real world.

The only way to gauge the accuracy of any computed value is to compare it to a physical system. For instance, many atomic and chemical properties can be computed by ModSim programs. Part of the solution might be geometry and/or energy levels that can be verified by comparing them to existing (or measured) spectroscopic information. Reality always wins.

In the DFT example mentioned above, verification of results is critical. The reduction in run times provided by data models will undoubtedly lead to increased use of DFT-based methods. A recent paper entitled, How to verify the precision of density-functional-theory implementations via reproducible and universal workflows, with forty-five authors, indicated the importance placed on verification of both ModSim and AI augmented HPC methods.

AI for Science is Different

One common misconception about AI is that it will replace existing processes and systems. While this goal may be true in other sectors and historically is the case for computers in general, HPC numerical ModSim methods are an integral part of the new AI data models being developed. Indeed, to train HPC-AI models, accurate data are necessary. The HPC sector has a remarkable advantage over the enterprise sector because it can create its own model data using established numerical modeling and simulation (ModSim) methods. Furthermore, this data can be fine-tuned based on the specific type of model training required. For instance, if a specific class of molecule is required, examples can be generated and used to train a model for this particular case.

In addition, science and, by extension, HPC have requirements not found in the enterprise sector, including reproducibility, openness, collaboration, and documentation (as seen in research papers). Information creation and data flow are very different in the scientific sector.

To be clear, the speed-up offered by AI-augmented HPC is not necessarily a “free lunch.“ The computational resources needed to train the model may offset the speed gains of the data model; however, this depends on how specific or general the model was trained.

Where is All This Headed?

The synergistic nature of traditional ModSim results and data-based AI models, along with the necessary big data management methodologies, has created a virtuous cycle of data discovery that will accelerate scientific discovery. As indicated in the figure below, a cycle can develop that builds on each past discovery loop. Consider each step in the figure,

Scientific Research and HPC: Grand-challenge science requires HPC capability and has the capacity to generate a very high volume of ModSim data.
Data Feeds AI Models: Data Management is critical. High volumes of data must be managed, cleaned, curated, archived, sourced, and stored.
“Data“ Models Improve Research: Armed with insights from the data, AI models/LLMs/LQMs analyze patterns, learn from examples, and make predictions. HPC systems are required for training, inference, and predicting new data for Step 1.
Lather, Rinse, Repeat

The opportunity for AI-augmented science has not gone unnoticed. The Trillion Parameter Consortium (TPC) has been established to address the unique needs of AI and Science. As has been outlined, the needs of Scientific discovery are very different from those of enterprise organizations. In particular, the requirement for open data and processes is necessary for scientific progress. The TPC is an open community based on and open to all scientists and engineers interested in leveraging AI methods for HPC and science, including programming, agentic systems, AI-augmented models, and reporting.

The next “all hands meeting“ will take place from July 28 to 31, 2025, in San Jose, California. This worldwide gathering and exhibition will include AI leaders from industry, academia, national laboratories, the vendor community, funding agencies, and VCs to develop best practices for utilizing AI for scientific discovery and engineering at scale. It will include expert speakers, hackathons, tutorials, working groups, and a growing community. It is not too late to register.

To learn more about TPC25, see the HPCwire article, Eight Key Questions About the Trillion Parameter Consortium (TPC) and TPC25 Event.

This article first appeared on our sister publication, HPCwire.

Categories: Academia, AI/ML/DL, Energy, Financial Services, Government, Healthcare, Life Sciences, Manufacturing, Media, Retail, Sectors, Software, Systems

Source link

What's Hot

Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice – Takara TLDR

How Grok, ChatGPT, Claude, Perplexity, and Gemini handle your data for AI training

Levi & Korsinsky Reminds C3.ai Investors of the Pending Class Action Lawsuit With a Lead Plaintiff Deadline of October 21, 2025 – AI

The Unlikely Reasonableness of AI-Augmented HPC

Meet 2025 AIwire Person to Watch Ian Buck

Tesla talks Semi ramp, Optimus, Robotaxi rollout, FSD with Wall Street firm

Tesla starts Full Self-Driving rollout to owners in Australia

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Australian School Faces Pushback over AI Art Course—and More Art News