Rethinking The Shape Convention Of An MLP - Takara TLDR

Multi-layer perceptrons (MLPs) conventionally follow a narrow-wide-narrow
design where skip connections operate at the input/output dimensions while
processing occurs in expanded hidden spaces. We challenge this convention by
proposing wide-narrow-wide (Hourglass) MLP blocks where skip connections
operate at expanded dimensions while residual computation flows through narrow
bottlenecks. This inversion leverages higher-dimensional spaces for incremental
refinement while maintaining computational efficiency through parameter-matched
designs. Implementing Hourglass MLPs requires an initial projection to lift
input signals to expanded dimensions. We propose that this projection can
remain fixed at random initialization throughout training, enabling efficient
training and inference implementations. We evaluate both architectures on
generative tasks over popular image datasets, characterizing
performance-parameter Pareto frontiers through systematic architectural search.
Results show that Hourglass architectures consistently achieve superior Pareto
frontiers compared to conventional designs. As parameter budgets increase,
optimal Hourglass configurations favor deeper networks with wider skip
connections and narrower bottlenecks-a scaling pattern distinct from
conventional MLPs. Our findings suggest reconsidering skip connection placement
in modern architectures, with potential applications extending to Transformers
and other residual networks.

Source link

What's Hot

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs – Takara TLDR

Huawei Ascend Roadmap Could Challenge Nvidia AI Leadership

No More Pikachu Oppenheimer? OpenAI Promises Rightsholders More Control Over Sora Creations

Rethinking the shape convention of an MLP – Takara TLDR

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs – Takara TLDR

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction – Takara TLDR

Learning to Reason for Hallucination Span Detection – Takara TLDR

Record Exec and Art Collector Gets Over 4 Years

Chicago’s Art Scene Offers a Beacon of Hope for Artists and Dealers

Pace to Close Hong Kong Gallery at H Queen’s This Month

Taylor Swift’s ‘Fate of Ophelia’ Has a Lot in Common with This Artwork

MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs – Takara TLDR

Huawei Ascend Roadmap Could Challenge Nvidia AI Leadership

No More Pikachu Oppenheimer? OpenAI Promises Rightsholders More Control Over Sora Creations

What's Hot

Rethinking the shape convention of an MLP – Takara TLDR

Related Posts

Subscribe to Updates