Harnessing LLMs for Scientific Computing

Large language models (LLMs) have developed dramatically in the past few years, with applications ranging from text processing to predicting virus variants. As the datasets on which LLM models are trained become increasingly massive — including trillions of parameters — there is a growing need for strategies to make these models less costly and more effective for scientific uses such as code translation, visualization, compression, privacy protection, and prediction.

Researchers in the Mathematics and Computer Science (MCS) division at the U.S. Department of Energy’s Argonne National Laboratory have addressed this need in several ways.

Converting Code

A key problem in science is converting legacy Fortran codes to C++. Although Fortran is highly performant, its support for heterogeneous platforms is inferior to that of C++. Manual translation has been the typical approach, but the process is nontrivial, requiring extensive knowledge of both languages and can be extremely labor intensive.

“This is where CodeScribe shines,” said Anshu Dubey, a senior computational scientist and lead PI for the research.

CodeScribe is a new tool that combines user supervision with chat completion — a technique that uses structured conversations to craft the most effective “prompts” that produce the desired output. To enhance the process, CodeScribe leverages emerging generative AI technologies. First, it maps the project structure by indexing subroutines, modules, and functions across various files. Next, it generates a draft of the C++ code for a given Fortran source file. The generated results are reviewed, and errors are addressed manually by the developer or sent for regeneration by updating the original prompt.

“CodeScribe automates many aspects, but human expertise remains essential for the final review,” said Akash Dhruv, an assistant computational scientist and primary developer of the new tool.

CodeScribe was motivated by scientists’ desire to convert MCFM — a Monte Carlo code that simulates particle interactions observed at the Large Hadron Collider — so the code would be interoperable with other high-energy physics codes and libraries. The researchers used several generative AI models for the MCFM code conversion, each with distinct parameter counts and capabilities. While GPT-4o emerged as the most effective model in this context. (see Fig. 1), the performance also revealed opportunities for optimization, particularly concerning the manual review and testing processes associated with such translations.

Figure 1: Schematic of the workflow for LLM-based code conversion process. Steps in blue are managed using CodeScribe, while steps in red are manual and require developer intervention.

In ongoing work, the researchers are applying CodeScribe to other applications. For example, they use CodeSource to build GPU compatibility between the Flash-X open-source multiphase simulation software and the AMRex framework for block-structured adaptive mesh refinement applications. The researchers envision CodeScribe as a valuable tool that empowers developers in scientific computing to leverage generative AI effectively.

For further information, see A. Dhruv and A. Dubey, “Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing,” accepted by the Platform for Advanced Scientific Computing conference.

Making LLMs More Manageable

Another major challenge facing LLMs is making them accessible with significantly reduced computational resources. Pruning has emerged as an important compression strategy to enhance memory and computational efficiency, but traditional global pruning has been impractical for LLMs because of scalability issues.

Researchers from Emory University and Argonne have developed SparseLLM to address this challenge. This innovative method redefines the global pruning process into multiple local optimization subproblems coordinated by auxiliary variables. The researchers introduced an alternating optimization strategy in which some subproblems are optimized while others are kept fixed; the process is then repeated with a different subset. For the optimization, they leveraged sparsity-aware algorithms to optimize both the pruning mask selection and weight reconstruction simultaneously, ensuring minimal performance degradation (see Fig. 2).

Figure 2: SparseLLM decomposes the global pruning of LLMs into manageable subproblems by leveraging the chain of modules and auxiliary variables.

“By not optimizing all the variables at the same time, we can achieve more scalable training while reducing the computational cost,” said Kibaek Kim, a computational mathematician and one of the developers of SparseLLM.

For further information, see the paper by Guangji Bai, Yijiang Li, Kibaek Kim, and Liang Zhao, “Towards Global Pruning for Pre-trained Language Models,” arXivL2402.17946; poster at NeurlPS 2024.

Reasoning with LLMs

Whether LLMs can reason has become a highly debated issue. Some studies cite achievements in multistep planning and prediction as demonstrating LLMs’ reasoning capabilities; others argue that “true reasoning” goes beyond LLMs’ ability to recognize patterns and apply logical rules. In a recent study, Argonne and the University of Pennsylvania researchers joined the debate by focusing on a new aspect — LLM token biases when solving logical problems.

They introduced a hypothesis-testing framework to evaluate multiple commercial and open-source LLMs. They applied tests on matched problem pairs to detect performance shifts when logically irrelevant tokens, such as names or quantifiers, were altered. The results showed that many state-of-the-art LLMs fail to generalize logical reasoning across minor perturbations, suggesting they often rely on superficial token patterns rather than formal logical reasoning (see Fig. 3).

Figure 3: Experimental results in which the perturbed problems add the names of trustworthy news agencies and universities to alter the narratives of syllogisms. LLMs tend to falsely believe that these narratives are more trustworthy and hence ignore the logical fallacy in them.

“We demonstrated statistically that apparent reasoning success may stem from token bias rather than actual understanding,” said Tanwi Mallick, an assistant computer scientist. “The study provides new insights into the reliability of LLMs and opens avenues for future work on ways to improve LLMs’ logical reasoning ability.”

For further information, see the paper by Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J Su, Camillo J Taylor, and Dan Roth, “A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 4722–4756.

Visualizing with an LLM

Creating scientific visualizations is challenging, consuming considerable time, and requiring data analysis and visualization expertise. Four researchers from Argonne’s MCS Division have proposed a new approach: synthetic software generation using an LLM. To this end, they have developed an AI assistant, ChatVis, that allows the user to specify a chain of analysis/visualization operations in natural language.

ChatVis generates a Python script for the desired operations and iterates until the script executes correctly, prompting the LLM to revise the script as needed. Moreover, the LLMs are not trained in esoteric visualization operations; instead, ChatVis allows commonly available LLMs, such as ChatGPT, to generate correct visualizations without retraining or fine-tuning.

“ChatVis employs a friendly human-centric natural-language interface,” said Orcun Yildiz, an assistant computer scientist. “Domain scientists, managers, administrators, and decision-makers who are not visualization experts can now generate their own high-quality visualizations.”

The Argonne team compared visualizations against five state-of-the-art LLM models with and without ChatVis. With ChatVis, they generated all five visualizations successfully; without ChatVis, the best LLM could generate only one of the five cases correctly. Figure 4 shows how closely ChatVis matched the ground truth.

Figure 4: Generated images for volume rendering. ChatVis produced a screenshot identical to the ground truth except for a different color palette because the user prompt did not specify one.

For the full study, see the paper by Tanwi Mallick, Orcun Yildiz, David Lenz, and Tom Peterka, “ChatVis: Automating Scientific Visualization with a Large Language Model,” in SC24-W: Workshops of the International Conference for High-Performance Computing, Networking, Storage and Analysis, pp. 49-55.

Pruning and Privacy with LLMs

The enormous size of LLM datasets means they have high computational and storage demands. Pruning can reduce model size, but most methods assume public access to the data, hindering their use in privacy-sensitive applications.

Researchers at Argonne and Emory University have now proposed the first federated learning framework designed specifically for pruning LLMs over distributed data silos. Called FedSpaLLM, the new framework enables clients to prune their models locally based on private datasets, each owned by a different institution. The approach allows collaboration without sharing the raw data; only the model updates (e.g., weights or parameters) are shared, thus ensuring data privacy.

The researchers introduce three innovations in FedSpaLLM: a specialized aggregation function to handle sparse model updates, an adaptive mask expansion technique to ensure that the global model meets the target sparsity, and a layer sampling strategy that allows clients to prune subsets of model layers based on their computational resources, enabling personalized pruning and reducing communication costs.

Extensive experiments show the efficacy of the approach. The key metric used with FedSpaLLM to evaluate the models was “perplexity,” a well-known metric considered well-suited for assessing the accuracy of compression methods and thus measuring the performance of the compressed models.

“Perplexity seems an odd name, but it reflects the confidence the model has in making a prediction,” said Yijiang Li, a postdoctoral appointee. “Lower perplexity means greater confidence.”

As shown in Fig. 5, FedSpaLLM consistently outperforms “standalone” client models in achieving lower perplexity. Random pruning (not shown) suffers from perplexity, an order of magnitude higher than the standalone models and FedSpaLLM.

Figure 5: Performance improvements in the metric perplexity from FedSpaLLM over client “standalone” models.

In general, as the target sparsity increases, the perplexity of the global model (e.g., FedSpaLLM) improves more than that of the client models (e.g., standalone).

The results highlight the benefits of FedSpaLLM as a promising solution for resource-constrained applications where privacy is critical.

For further information, see the preprint by Guangji Bai, Yijiang Li, Zilinghan Li, Liang Zhao, and Kibaek Kim. “FedSpaLLM: Federated Pruning of Large Language Models,” to appear in NAACL 2025.

Originally posted by Argonne National Lab MCS, reprinted here with permission.

Source link

What's Hot

DeepMind and OpenAI achieve IMO Gold. What does it all mean?

Talent Acquisition’s Playbook | Recruiting News Network

Apple Intelligence news summaries are back, with a big red disclaimer

Harnessing LLMs for Scientific Computing

Tesla Robotaxi wins over firm that said it was ‘likely to disappoint’

Cognition AI’s Windsurf Buy Is Part of a Bigger Shift in AI Deal-Making

White House to unveil plan to push US AI abroad, crackdown on US AI rules, document shows

3,800-Year-Old Warrior’s Tomb Unearthed in Azerbaijan

Morning Links for July 22, 2025

Ronald Perelman’s $410 Million Art Insurance Trial Begins over Fire-Damaged Works

Artists Call for Reinstatement of Ousted Whitney ISP Leader

DeepMind and OpenAI achieve IMO Gold. What does it all mean?

Talent Acquisition’s Playbook | Recruiting News Network

Apple Intelligence news summaries are back, with a big red disclaimer

What's Hot

Harnessing LLMs for Scientific Computing

Converting Code

Making LLMs More Manageable

Reasoning with LLMs

Visualizing with an LLM

Pruning and Privacy with LLMs

Related

Related Posts

Subscribe to Updates