Image GPT: Generative Pretraining From Pixels (Paper Explained)

BERT and GPT-2/3 have shown the enormous power of using generative models as pre-training for classification tasks. However, for images, pre-training is usually done with supervised or self-supervised objectives. This paper investigates how far you can get when applying the principles from the world of NLP to the world of images.

OUTLINE:
0:00 – Intro & Overview
2:50 – Generative Models for Pretraining
4:50 – Pretraining for Visual Tasks
7:40 – Model Architecture
15:15 – Linear Probe Experiments
24:15 – Fine-Tuning Experiments
30:25 – Conclusion & Comments

Paper:

Blog:
Code:

Abstract:
Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Authors: Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever

Links:
YouTube:
Twitter:
Discord:
BitChute:
Minds:

source

What's Hot

Is OpenAI ChatGPT Down? Here are the Best AI alternative chatbots

ASML becomes Mistral AI’s top shareholder after leading latest funding round, sources say

Nvidia Shares Drop In Premarket—Reports Say U.S. Will Take Cut Of China Chip Sales

Image GPT: Generative Pretraining from Pixels (Paper Explained)

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Tony Shafrazi and the Art of the Comeback

Basquiats Linked to 1MDB Scandal Auctioned by US Government

US Ambassador to UK Fills Residence with Impressionist Masters

New Code of Ethics Implores UK Museums to End Fossil Fuel Sponsorships

Is OpenAI ChatGPT Down? Here are the Best AI alternative chatbots

ASML becomes Mistral AI’s top shareholder after leading latest funding round, sources say

Nvidia Shares Drop In Premarket—Reports Say U.S. Will Take Cut Of China Chip Sales

What's Hot

Image GPT: Generative Pretraining from Pixels (Paper Explained)

Related Posts

Subscribe to Updates