Scalable Extraction of Training Data from (Production) Language Models (Paper Explained)

#chatgpt #privacy #promptengineering

Researchers were able to get giant amounts of training data out of ChatGPT by simply asking it to repeat a word many times over, which causes the model to diverge and start spitting out memorized text.
Why does this happen? And how much of their training data do such models really memorize verbatim?

OUTLINE:
0:00 – Intro
8:05 – Extractable vs Discoverable Memorization
14:00 – Models leak more data than previously thought
20:25 – Some data is extractable but not discoverable
25:30 – Extracting data from closed models
30:45 – Poem poem poem
37:50 – Quantitative membership testing
40:30 – Exploring the ChatGPT exploit further
47:00 – Conclusion

Paper:

Abstract:
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.

Authors: Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, Katherine Lee

Links:
Homepage:
Merch:
YouTube:
Twitter:
Discord:
LinkedIn:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

source

What's Hot

Bell and Cohere Launch Secure Canadian AI Platform

The Role of Network Resilience in AI-Enabled Supply Chains

Amazon Nova Act SDK (preview): Path to production for browser automation agents

Scalable Extraction of Training Data from (Production) Language Models (Paper Explained)

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Yannic Kilcher Live Stream

Picasso’s ‘Demoiselles’ May Not Have Been Inspired by African Art

Scottish Museum Group Warns of ‘Policing of Gender’—and More Art News

David Geffen Sued By Estranged Husband for Breach of Contract

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Bell and Cohere Launch Secure Canadian AI Platform

The Role of Network Resilience in AI-Enabled Supply Chains

Amazon Nova Act SDK (preview): Path to production for browser automation agents

What's Hot

Scalable Extraction of Training Data from (Production) Language Models (Paper Explained)

Related Posts

Subscribe to Updates