Axial-DeepLab: Stand-Alone Axial-Attention For Panoptic Segmentation (Paper Explained)

#ai #machinelearning #attention

Convolutional Neural Networks have dominated image processing for the last decade, but transformers are quickly replacing traditional models. This paper proposes a fully attentional model for images by combining learned Positional Embeddings with Axial Attention. This new model can compete with CNNs on image classification and achieve state-of-the-art in various image segmentation tasks.

OUTLINE:
0:00 – Intro & Overview
4:10 – This Paper’s Contributions
6:20 – From Convolution to Self-Attention for Images
16:30 – Learned Positional Embeddings
24:20 – Propagating Positional Embeddings through Layers
27:00 – Traditional vs Position-Augmented Attention
31:10 – Axial Attention
44:25 – Replacing Convolutions in ResNet
46:10 – Experimental Results & Examples

Paper:
Code:

My Video on BigBird:
My Video on ResNet:
My Video on Attention:

Abstract:
Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

Authors: Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

Links:
YouTube:
Twitter:
Discord:
BitChute:
Minds:
Parler:
LinkedIn:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

source

What's Hot

OpenAI CEO Sam Altman Believes We’re in an AI Bubble

OpenAI pivots to for-profit model to secure massive AI funding

Busted by the em dash — AI’s favorite punctuation mark, and how it’s blowing your cover

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Mütter Museum in Philadelphia Announces New Policy for Human Remains

Inigo Philbrick, Art Dealer Convicted of Fraud, Appears in BBC Film

Links for August 22, 2025

White House Targets Specific Artworks at Smithsonian Museums

OpenAI CEO Sam Altman Believes We’re in an AI Bubble

OpenAI pivots to for-profit model to secure massive AI funding

Busted by the em dash — AI’s favorite punctuation mark, and how it’s blowing your cover

What's Hot

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)

Related Posts

Subscribe to Updates