How does an LLM understand the meaning of ‘wRiTe’ when its building blocks—the individual character tokens ‘w’, ‘R’, ‘i’—have no semantic content? This simple question challenges the very foundation of modern AI.
Our paper argues that high-level meaning is not contained in embeddings, but is constructed by the Transformer architecture. We prove this by replacing standard trainable embeddings with a completely frozen layer derived from the raw visual structure of Unicode glyphs. These non-semantic vectors are fixed before training even begins.
The result is paradigm-shifting: our models not only converge but consistently outperform identical architectures on reasoning benchmarks. This reveals a core principle for development: Induction. Instead of forcing a model to guess all its knowledge at once, we give it simple, immutable rules (the visual form of characters) and let it build complexity from there.
It’s the difference between trying to freeze an entire lake instantly, versus letting a solid sheet of ice form layer by layer. It’s the power of a locomotive moving an entire train by first conquering the inertia of a single car.
This foundational discovery unlocks a powerful new methodology. In this paper we demonstrate the practical payoff: merging expert models like LEGOs and “growing” powerful AI systems incrementally.
This two-part work presents a blueprint for a more modular, efficient, and scalable future for AI.