The Tencent Hunyuan team presented a compelling research achievement at the 2025 ICML conference, revealing the profound impact of floating-point precision on AI model training effectiveness. This research not only brings new breakthroughs to the training efficiency of AI models but also provides significant references for hardware design and optimization of AI applications. The findings were published in a paper numbered arXiv:2501.02423v3, which interested readers can explore in depth.
Floating Point Precision Optimization: The ‘Golden Ratio’ of AI Training
The core of the research lies in exploring how to find the optimal precision configuration in floating-point quantization trainingto reduce computational costs while ensuring model performance. It is well-known that training large AI models requires handling massive numerical computations, and floating-point numbers are a critical way for computers to represent numbers. The research team conducted extensive experiments, analyzing the impact of exponent and mantissa configurations on model performance. They discovered that the contribution of exponent bits to model performance is slightly greater than that of mantissa bits, which contrasts with traditional viewpoints. The results indicate that, within a limited budget of bits, appropriately increasing the number of exponent bits can yield better outcomes. For instance, when a total of 4 bits are available, the optimal configuration is 2 exponent bits and 1 mantissa bit; with 8 bits, the best configuration is 4 exponent bits and 3 mantissa bits; and with 16 bits, the optimal setup is 8 exponent bits and 7 mantissa bits. This finding provides essential references for hardware manufacturers, assisting them in designing more efficient AI training chips.
The ‘Critical Point’ of Data Scale and the Capybara Scaling Law
The research team also uncovered an interesting phenomenon: in low-precision training, having more training data does not necessarily lead to better results. They introduced the concept of ‘critical data size,’ which indicates that when the amount of training data exceeds this critical value, model performance may actually decline. This is different from high-precision training, where increasing the training data typically continues to improve model performance. Through mathematical derivation, the team developed an exact formula for calculating this critical data size and found that as model scale increases, training precision rises, and quantization block size decreases, this critical point is delayed. To quantify these findings, the research team proposed the Capybara Scaling Law, a mathematical formula that can accurately predict AI model performance under various combinations of model size, data volume, exponent bits, mantissa bits, and quantization block size. The core idea of the Capybara Scaling Law is to combine the traditional Chinchilla scaling law with the precision impact component to more accurately predict model performance in low-precision training scenarios.
Optimal Allocation of Computational Budget and Industry Impact
The research team also explored how to optimally allocate computational resources under a fixed computational budget. They found that using aggressive quantization strategies (such as FP8 or even FP4) during the early stages of training can rapidly lead the model to converge at a good level; as data volume increases and ‘knowledge density’ rises, gradually increasing training precision to BF16 or even FP32 helps maintain optimal training results. Furthermore, they discovered that when simultaneously optimizing model size, data volume, and precision, the optimal cost-performance precision consistently remains between 4-8 bits across a wide range of computational budgets. This means that 4-8 bit precisiontraining can achieve the best cost-effectiveness. The results of this research have far-reaching implications for the AI industry. It provides AI model developers with superior training strategies, offers hardware manufacturers efficient chip design guidelines, and gives research institutions and companies recommendations for achieving the best results within limited budgets. By optimizing quantization strategies, more research institutions and small companies can train high-quality AI models with fewer computational resources, thus promoting the popularization and development of AI technology.
What new breakthroughs do you think might further enhance the efficiency of AI model trainingin the future?
返回搜狐,查看更多
平台声明:该文观点仅代表作者本人,搜狐号系信息发布平台,搜狐仅提供信息存储空间服务。