Coding assistants like GitHub Copilot and Codeium are already changing software engineering. Based on existing code and an engineer’s prompts, these assistants can suggest new lines or whole chunks of code, serving as a kind of advanced autocomplete.
At first glance, the results are fascinating. Coding assistants are already changing the work of some programmers and transforming how coding is taught. However, this is the question we need to answer: Is this kind of generative AI just a glorified help tool, or can it actually bring substantial change to a developer’s workflow?
At Advanced Micro Devices (AMD), we design and develop CPUs, GPUs, and other computing chips. But a lot of what we do is developing software to create the low-level software that integrates operating systems and other customer software seamlessly with our own hardware. In fact, about half of AMD engineers are software engineers, which is not uncommon for a company like ours. Naturally, we have a keen interest in understanding the potential of AI for our software-development process.
To understand where and how AI can be most helpful, we recently conducted several deep dives into how we develop software. What we found was surprising: The kinds of tasks coding assistants are good at—namely, busting out lines of code—are actually a very small part of the software engineer’s job. Our developers spend the majority of their efforts on a range of tasks that include learning new tools and techniques, triaging problems, debugging those problems, and testing the software.
We hope to go beyond individual assistants for each stage and chain them together into an autonomous software-development machine—with a human in the loop, of course.
Even for the coding copilots’ bread-and-butter task of writing code, we found that the assistants offered diminishing returns: They were very helpful for junior developers working on basic tasks, but not that helpful for more senior developers who worked on specialized tasks.
To use artificial intelligence in a truly transformative way, we concluded, we couldn’t limit ourselves to just copilots. We needed to think more holistically about the whole software-development life cycle and adapt whatever tools are most helpful at each stage. Yes, we’re working on fine-tuning the available coding copilots for our particular code base, so that even senior developers will find them more useful. But we’re also adapting large language models to perform other parts of software development, like reviewing and optimizing code and generating bug reports. And we’re broadening our scope beyond LLMs and generative AI. We’ve found that using discriminative AI—AI that categorizes content instead of generating it—can be a boon in testing, particularly in checking how well video games run on our software and hardware.
The author and his colleagues have trained a combination of discriminative and generative AI to play video games and look for artifacts in the way the images are rendered on AMD hardware, which helps the company find bugs in its firmware code. Testing images: AMD; Original images by the game publishers.
In the short term, we aim to implement AI at each stage of the software-development life cycle. We expect this to give us a 25 percent productivity boost over the next few years. In the long term, we hope to go beyond individual assistants for each stage and chain them together into an autonomous software-development machine—with a human in the loop, of course.
Even as we go down this relentless path to implement AI, we realize that we need to carefully review the possible threats and risks that the use of AI may introduce. Equipped with these insights, we’ll be able to use AI to its full potential. Here’s what we’ve learned so far.
The potential and pitfalls of coding assistants
GitHub research suggests that developers can double their productivity by using GitHub Copilot. Enticed by this promise, we made Copilot available to our developers at AMD in September 2023. After half a year, we surveyed those engineers to determine the assistant’s effectiveness.
We also monitored the engineers’ use of GitHub Copilot and grouped users into one of two categories: active users (who used Copilot daily) and occasional users (who used Copilot a few times a week). We expected that most developers would be active users. However, we found that the number of active users was just under 50 percent. Our software review found that AI provided a measurable increase in productivity for junior developers performing simpler programming tasks. We observed much lower productivity increases with senior engineers working on complex code structures. This is in line with research by the management consulting firm McKinsey & Co.
When we asked the engineers about the relatively low Copilot usage, 75 percent of them said they would use Copilot much more if the suggestions were more relevant to their coding needs. This doesn’t necessarily contradict GitHub’s findings: AMD software is quite specialized, and so it’s understandable that applying a standard AI tool like Github Copilot, which is trained using publicly available data, wouldn’t be that helpful.
For example, AMD’s graphics-software team develops low-level firmware to integrate our GPUs into computer systems, low-level software to integrate the GPUs into operating systems, and software to accelerate graphics and machine learning operations on the GPUs. All of this code provides the base for applications, such as games, video conferencing, and browsers, to use the GPUs. AMD’s software is unique to our company and our products, and the standard copilots aren’t optimized to work on our proprietary data.
To overcome this issue, we will need to train tools using internal datasets and develop specialized tools focused on AMD use cases. We are now training a coding assistant in-house using AMD use cases and hope this will improve both adoption among developers and resulting productivity. But the survey results made us wonder: How much of a developer’s job is writing new lines of code? To answer this question, we took a closer look at our software-development life cycle.
Inside the software-development life cycle
AMD’s software-development life cycle consists of five stages.
We start with a definition of the requirements for the new product, or a new version of an existing product. Then, software architects design the modules, interfaces, and features to satisfy the defined requirements. Next, software engineers work on development, the implementation of the software code to fulfill product requirements according to the architectural design. This is the stage where developers write new lines of code, but that’s not all they do: They may also refactor existing code, test what they’ve written, and subject it to code review.
Next, the test phase begins in earnest. After writing code to perform a specific function, a developer writes a unit or module test—a program to verify that the new code works as required. In large development teams, many modules are developed or modified in parallel. It’s essential to confirm that any new code doesn’t create a problem when integrated into the larger system. This is verified by an integration test, usually run nightly. Then, the complete system is run through a regression test to confirm that it works as well as it did before new functionality was included, a functional test to confirm old and new functionality, and a stress test to confirm the reliability and robustness of the whole system.
Finally, after the successful completion of all testing, the product is released and enters the support phase.
Even in the development and test phases, developing and testing new code collectively take up only about 40 percent of the developer’s work.
The standard release of a new AMD Adrenalin graphics-software package takes an average of six months, followed by a less-intensive support phase of another three to six months. We tracked one such release to determine how many engineers were involved in each stage. The development and test phases were by far the most resource intensive, with 60 engineers involved in each. Twenty engineers were involved in the support phase, 10 in design, and five in definition.
Because development and testing required more hands than any of the other stages, we decided to survey our development and testing teams to understand what they spend time on from day to day. We found something surprising yet again: Even in the development and test phases, developing and testing new code collectively take up only about 40 percent of the developer’s work.
The other 60 percent of a software engineer’s day is a mix of things: About 10 percent of the time is spent learning new technologies, 20 percent on triaging and debugging problems, almost 20 percent on reviewing and optimizing the code they’ve written, and about 10 percent on documenting code.
Many of these tasks require knowledge of highly specialized hardware and operating systems, which off-the-shelf coding assistants just don’t have. This review was yet another reminder that we’ll need to broaden our scope beyond basic code autocomplete to significantly enhance the software-development life cycle with AI.
AI for playing video games and more
Generative AI, such as large language models and image generators, are getting a lot of airtime these days. We have found, however, that an older style of AI, known as discriminative AI, can provide significant productivity gains. While generative AI aims to create new content, discriminative AI categorizes existing content, such as identifying whether an image is of a cat or a dog, or identifying a famous writer based on style.
We use discriminative AI extensively in the testing stage, particularly in functionality testing, where the behavior of the software is tested under a range of practical conditions. At AMD, we test our graphics software across many products, operating systems, applications, and games.
Nick Little
For example, we trained a set of deep convolutional neural networks (CNNs) on an AMD-collected dataset of over 20,000 “golden” images—images that don’t have defects and would pass the test—and 2,000 distorted images. The CNNs learned to recognize visual artifacts in the images and to automatically submit bug reports to developers.
We further boosted test productivity by combining discriminative AI and generative AI to play video games automatically. There are many elements to playing a game, including understanding and navigating screen menus, navigating the game world and moving the characters, and understanding game objectives and actions to advance in the game.
While no game is the same, this is basically how it works for action-oriented games: A game usually starts with a text screen to choose options. We use generative AI large vision models to understand the text on the screen, navigate the menus to configure them, and start the game. Once a playable character enters the game, we use discriminative AI to recognize relevant objects on the screen, understand where the friendly or enemy nonplayable characters may be, and direct each character in the right direction or perform specific actions.
To navigate the game, we use several techniques—for example, generative AI to read and understand in-game objectives, and discriminative AI to determine mini-maps and terrain features. Generative AI can also be used to predict the best strategy based on all the collected information.
Overall, using AI in the functional testing stage reduced manual test efforts by 15 percent and increased how many scenarios we can test by 20 percent. But we believe this is just the beginning. We’re also developing AI tools to assist with code review and optimization, problem triage and debugging, and more aspects of code testing.
Once we reach full adoption and the tools are working together and seamlessly integrated into the developer’s environment, we expect overall team productivity to rise by more than 25 percent.
For review and optimization, we’re creating specialized tools for our software engineers by fine-tuning existing generative AI models with our own code base and documentation. We’re starting to use these fine-tuned models to automatically review existing code for complexity, coding standards, and best practices, with the goal of providing humanlike code review and flagging areas of opportunity.
Similarly, for triage and debugging, we analyzed what kinds of information developers require to understand and resolve issues. We then developed a new tool to aid in this step. We automated the retrieval and processing of triage and debug information. Feeding a series of prompts with relevant context into a large language model, we analyzed that information to suggest the next step in the workflow that will find the likely root cause of the problem. We also plan to use generative AI to create unit and module tests for a specific function in a way that’s integrated into the developer’s workflow.
These tools are currently being developed and piloted in select teams. Once we reach full adoption and the tools are working together and seamlessly integrated into the developer’s environment, we expect overall team productivity to rise by more than 25 percent.
Cautiously toward an integrated AI-agent future
The promise of 25 percent savings does not come without risks. We’re paying particular attention to several ethical and legal concerns around the use of AI.
First, we’re cautious about violating someone else’s intellectual property by using AI suggestions. Any generative AI software-development tool is necessarily built on a collection of data, usually source code, and is generally open source. Any AI tool we employ must respect and correctly use any third-party intellectual property, and the tool must not output content that violates this intellectual property. Filters and protections are needed to ensure compliance with this risk.
Second, we’re concerned about the inadvertent disclosure of our own intellectual property when we use publicly available AI tools. For example, certain generative AI tools may take your source code input and incorporate it into its larger training dataset. If this is a publicly available tool, it could expose your proprietary source code or other intellectual property to others using the tool.
Third, it’s important to be aware that AI makes mistakes. In particular, LLMs are prone to hallucinations, or providing false information. Even as we off-load more tasks to AI agents, we’ll need to keep a human in the loop for the foreseeable future.
Lastly, we’re concerned with possible biases that the AI may introduce. In software-development applications, we must ensure that the AI’s suggestions don’t create unfairness, that generated code is within the bounds of human ethical principles and doesn’t discriminate in any way. This is another reason a human in the loop is imperative for responsible AI.
Keeping all these concerns front of mind, we plan to continue developing AI capabilities throughout the software-development life cycle. Right now, we’re building individual tools that can assist developers in the full range of their daily tasks—learning, code generation, code review, test generation, triage, and debugging. We’re starting with simple scenarios and slowly evolving these tools to be able to handle more-complex scenarios. Once these tools are mature, the next step will be to link the AI agents together in a complete workflow.
The future we envision looks like this: When a new software requirement comes along, or a problem report is submitted, AI agents will automatically find the relevant information, understand the task at hand, generate relevant code, and test, review, and evaluate the code, cycling over these steps until the system finds a good solution, which is then proposed to a human developer.
Even in this scenario, we will need software engineers to review and oversee the AI’s work. But the role of the software developer will be transformed. Instead of programming the software code, we will be programming the agents and the interfaces among agents. And in the spirit of responsible AI, we—the humans—will provide the oversight.
From Your Site Articles
Related Articles Around the Web