Mixup: Beyond Empirical Risk Minimization (Paper Explained)

Neural Networks often draw hard boundaries in high-dimensional space, which makes them very brittle. Mixup is a technique that linearly interpolates between data and labels at training time and achieves much smoother and more regular class boundaries.

OUTLINE:
0:00 – Intro
0:30 – The problem with ERM
2:50 – Mixup
6:40 – Code
9:35 – Results

Abstract:
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

Links:
YouTube:
Twitter:
BitChute:
Minds:

source

What's Hot

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models – Takara TLDR

Fine-tune OpenAI GPT-OSS models using Amazon SageMaker HyperPod recipes

OpenAI GPT-5 vs GPT-6 : Key Features and Future Impacts

mixup: Beyond Empirical Risk Minimization (Paper Explained)

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Czech Man Sues Christie’s For Information on Nazi-Looted Artworks

Tanya Bonakdar Gallery to Close Los Angeles Space

Ancient Silver Coins Suggest New History of Trading in Southeast Asia

Sasan Ghandehari Sues Christie’s Over Picasso Once Owned by a Criminal

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models – Takara TLDR

Fine-tune OpenAI GPT-OSS models using Amazon SageMaker HyperPod recipes

OpenAI GPT-5 vs GPT-6 : Key Features and Future Impacts

What's Hot

mixup: Beyond Empirical Risk Minimization (Paper Explained)

Related Posts

Subscribe to Updates