Paper Page - Reparameterized LLM Training Via Orthogonal Equivalence Transformation

POET is a reParameterized training algorithm using Orthogonal Equivalence Transformation to optimize neurons in large language models, ensuring stable training and improved generalization.

AI-generated summary

While large language models (LLMs) are driving the rapid advancement of
artificial intelligence, effectively and reliably training these large models
remains one of the field’s most significant challenges. To address this
challenge, we propose POET, a novel reParameterized training algorithm that
uses Orthogonal Equivalence Transformation to optimize neurons. Specifically,
POET reparameterizes each neuron with two learnable orthogonal matrices and a
fixed random weight matrix. Because of its provable preservation of spectral
properties of weight matrices, POET can stably optimize the objective function
with improved generalization. We further develop efficient approximations that
make POET flexible and scalable for training large-scale neural networks.
Extensive experiments validate the effectiveness and scalability of POET in
training LLMs.

Source link

What's Hot

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation – Takara TLDR

China issues port crackdown on all Nvidia AI chip imports, says report — enforcement teams deployed to quash smuggling and investigate data center hardware, targeting H20 and RTX 6000D shipments

MIT rejects Trump compact, first to stand up to partisan demands

Paper page – Reparameterized LLM Training via Orthogonal Equivalence Transformation

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation – Takara TLDR

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window – Takara TLDR

First Try Matters: Revisiting the Role of Reflection in Reasoning Models – Takara TLDR

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation – Takara TLDR

China issues port crackdown on all Nvidia AI chip imports, says report — enforcement teams deployed to quash smuggling and investigate data center hardware, targeting H20 and RTX 6000D shipments

MIT rejects Trump compact, first to stand up to partisan demands

What's Hot

Paper page – Reparameterized LLM Training via Orthogonal Equivalence Transformation

Related Posts

Subscribe to Updates