#openai #science #gpt3
OpenAI’s newest model, DALL·E, shows absolutely amazing abilities in generating high-quality images from arbitrary text descriptions. Like GPT-3, the range of applications and the diversity of outputs is astonishing, given that this is a single model, trained on a purely autoregressive task. This model is a significant step towards the combination of text and images in future AI applications.
OUTLINE:
0:00 – Introduction
2:45 – Overview
4:20 – Dataset
5:35 – Comparison to GPT-3
7:00 – Model Architecture
13:20 – VQ-VAE
21:00 – Combining VQ-VAE with GPT-3
27:30 – Pre-Training with Relaxation
32:15 – Experimental Results
33:00 – My Hypothesis about DALL·E’s inner workings
36:15 – Sparse Attention Patterns
38:00 – DALL·E can’t count
39:35 – DALL·E can’t global order
40:10 – DALL·E renders different views
41:10 – DALL·E is very good at texture
41:40 – DALL·E can complete a bust
43:30 – DALL·E can do some reflections, but not others
44:15 – DALL·E can do cross-sections of some objects
45:50 – DALL·E is amazing at style
46:30 – DALL·E can generate logos
47:40 – DALL·E can generate bedrooms
48:35 – DALL·E can combine unusual concepts
49:25 – DALL·E can generate illustrations
50:15 – DALL·E sometimes understands complicated prompts
50:55 – DALL·E can pass part of an IQ test
51:40 – DALL·E probably does not have geographical / temporal knowledge
53:10 – Reranking dramatically improves quality
53:50 – Conclusions & Comments
Blog:
Links:
TabNine Code Completion (Referral):
YouTube:
Twitter:
Discord:
BitChute:
Minds:
Parler:
LinkedIn:
If you want to support me, the best thing to do is to share out the content 🙂
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
source