The Dragon Hatchling: The Missing Link Between The Transformer And Models Of The Brain - Takara TLDR

The relationship between computing systems and the brain has served as
motivation for pioneering theoreticians since John von Neumann and Alan Turing.
Uniform, scale-free biological networks, such as the brain, have powerful
properties, including generalizing over time, which is the main barrier for
Machine Learning on the path to Universal Reasoning Models.
We introduce `Dragon Hatchling’ (BDH), a new Large Language Model
architecture based on a scale-free biologically inspired network of \$n\$
locally-interacting neuron particles. BDH couples strong theoretical
foundations and inherent interpretability without sacrificing Transformer-like
performance.
BDH is a practical, performant state-of-the-art attention-based state space
sequence learning architecture. In addition to being a graph model, BDH admits
a GPU-friendly formulation. It exhibits Transformer-like scaling laws:
empirically BDH rivals GPT2 performance on language and translation tasks, at
the same number of parameters (10M to 1B), for the same training data.
BDH can be represented as a brain model. The working memory of BDH during
inference entirely relies on synaptic plasticity with Hebbian learning using
spiking neurons. We confirm empirically that specific, individual synapses
strengthen connection whenever BDH hears or reasons about a specific concept
while processing language inputs. The neuron interaction network of BDH is a
graph of high modularity with heavy-tailed degree distribution. The BDH model
is biologically plausible, explaining one possible mechanism which human
neurons could use to achieve speech.
BDH is designed for interpretability. Activation vectors of BDH are sparse
and positive. We demonstrate monosemanticity in BDH on language tasks.
Interpretability of state, which goes beyond interpretability of neurons and
model parameters, is an inherent feature of the BDH architecture.

Source link

What's Hot

Lupl – Task Management + Workflow Automation – Artificial Lawyer

BroRL: Scaling Reinforcement Learning via Broadened Exploration – Takara TLDR

Exclusive: Mira Murati’s Stealth AI Lab Launches Its First Product

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain – Takara TLDR

BroRL: Scaling Reinforcement Learning via Broadened Exploration – Takara TLDR

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! – Takara TLDR

OceanGym: A Benchmark Environment for Underwater Embodied Agents – Takara TLDR

Sotheby’s Sells York Avenue HQ to Weill Cornell, Prepares Breuer Move

Outsider Art Fair’s New Director Elizabeth Denny Discusses Her Role

50 Pianos Sound Off in ’11,000 Strings’ at the Park Avenue Armory

Smithsonian Museums to Remain Open Amid Government Shutdown

Lupl – Task Management + Workflow Automation – Artificial Lawyer

BroRL: Scaling Reinforcement Learning via Broadened Exploration – Takara TLDR

Exclusive: Mira Murati’s Stealth AI Lab Launches Its First Product

What's Hot

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain – Takara TLDR

Related Posts

Subscribe to Updates