Paper page - Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

How does an LLM understand the meaning of ‘wRiTe’ when its building blocks—the individual character tokens ‘w’, ‘R’, ‘i’—have no semantic content? This simple question challenges the very foundation of modern AI.
Our paper argues that high-level meaning is not contained in embeddings, but is constructed by the Transformer architecture. We prove this by replacing standard trainable embeddings with a completely frozen layer derived from the raw visual structure of Unicode glyphs. These non-semantic vectors are fixed before training even begins.
The result is paradigm-shifting: our models not only converge but consistently outperform identical architectures on reasoning benchmarks. This reveals a core principle for development: Induction. Instead of forcing a model to guess all its knowledge at once, we give it simple, immutable rules (the visual form of characters) and let it build complexity from there.
It’s the difference between trying to freeze an entire lake instantly, versus letting a solid sheet of ice form layer by layer. It’s the power of a locomotive moving an entire train by first conquering the inertia of a single car.
This foundational discovery unlocks a powerful new methodology. In this paper we demonstrate the practical payoff: merging expert models like LEGOs and “growing” powerful AI systems incrementally.
This two-part work presents a blueprint for a more modular, efficient, and scalable future for AI.

Source link

What's Hot

TU Wien Rendering #7 – Ray-Sphere Intersection

Coinbase and Perplexity AI Unite for Live Crypto Price Access

Paper page – Beyond the Linear Separability Ceiling

Paper page – Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Paper page – Beyond the Linear Separability Ceiling

Paper page – PyVision: Agentic Vision with Dynamic Tooling

Paper page – Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Homeland Security Targets Chicago’s National Museum of Puerto Rican Arts & Culture

1,600-Year-Old Tomb of Mayan City’s Founding King Discovered in Belize

Centre Pompidou Cancels Caribbean Art Show, Raising Controversy

‘Night at the Museum’ Reboot in the Works

TU Wien Rendering #7 – Ray-Sphere Intersection

Coinbase and Perplexity AI Unite for Live Crypto Price Access

Paper page – Beyond the Linear Separability Ceiling

What's Hot

Paper page – Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Related Posts

Subscribe to Updates