Paper Page - Dynamic Chunking For End-to-End Hierarchical Sequence Modeling

Hierarchical networks replace traditional tokenization pipelines by dynamically learning segmentation strategies, achieving better performance and scalability across various languages and modalities.

Despite incredible progress in language models (LMs) in recent years, largely
resulting from moving away from specialized models designed for specific tasks
to general models based on powerful architectures (e.g. the Transformer) that
learn everything from raw data, pre-processing steps such as tokenization
remain a barrier to true end-to-end foundation models. We introduce a
collection of new techniques that enable a dynamic chunking mechanism which
automatically learns content — and context — dependent segmentation
strategies learned jointly with the rest of the model. Incorporating this into
an explicit hierarchical network (H-Net) allows replacing the (implicitly
hierarchical) tokenization-LM-detokenization pipeline with a single model
learned fully end-to-end. When compute- and data- matched, an H-Net with one
stage of hierarchy operating at the byte level outperforms a strong Transformer
language model operating over BPE tokens. Iterating the hierarchy to multiple
stages further increases its performance by modeling multiple levels of
abstraction, demonstrating significantly better scaling with data and matching
a token-based Transformer of twice its size. H-Nets pretrained on English show
significantly increased character-level robustness, and qualitatively learn
meaningful data-dependent chunking strategies without any heuristics or
explicit supervision. Finally, the H-Net’s improvement over tokenized pipelines
is further increased in languages and modalities with weaker tokenization
heuristics, such as Chinese and code, or DNA sequences (nearly 4x improvement
in data efficiency over baselines), showing the potential of true end-to-end
models that learn and scale better from unprocessed data.

Source link

What's Hot

Commonwealth Fusion’s Giant Financing Leads Otherwise Slow Week For Big Deals

How Intuit killed the chatbot crutch – and built an agentic AI playbook you can copy

Recruit with Emotional Intelligence | Recruiting News Network

Paper page – Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Mixture of Contexts for Long Video Generation – Takara TLDR

OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models – Takara TLDR

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification – Takara TLDR

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Australian School Faces Pushback over AI Art Course—and More Art News

London Museum Secures Banksy’s Piranhas

Egyptian Antiquities Trafficker Sentenced to Six Months in Prison

Commonwealth Fusion’s Giant Financing Leads Otherwise Slow Week For Big Deals

How Intuit killed the chatbot crutch – and built an agentic AI playbook you can copy

Recruit with Emotional Intelligence | Recruiting News Network

What's Hot

Paper page – Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Related Posts

Subscribe to Updates