
Google’s brand-new “nano banana” image editor, all the talk of X, truly is amazing. Thanks to scaling (and some other tricks), the graphics are terrific, and the ability to have it edit photos that you upload is genuinely cool. Hell, you can even use Nano to make fun of me:

But thing, is, Garius Marcus Criticus (AKA the Imperator) was right: scaling can only take you so far.
For example, with respect to image generation systems, what the aforementioned Marcus said (as far back as 2022, often together with Ernest Davis) was that parts and wholes would pose problems in systems that rely on statistics without deep world knowledge. (The challenges that compositionality poses were also major theme of Marcus and Davis’s 2019 book Rebooting AI.)
And guess what, surprise, surprise, even the latest models still struggle with compositionality, as a grad student at UC Santa Barbara, Kenan Tang, quickly noted, in one of the first skeptical looks at nano:

Tang, who has a more scientific spirit than Unutmaz, was also quick to note that these problems are not new. They are persistent:

Over the last few years, graphics have gotten better and better. World models and comprehension less so.
Since I have made this point so many times in so many ways in so many previous essays, I won’t belabor this point at length. Suffice to say that I was able to break nano banana literally on my first try:

You don’t have to have a PhD in mechanical engineering to spot some of the errors. (It’s also not so hot on recumbent bikes.)
For all the hype, Nano Banana is not all that much better at this sort of thing than ChatGPT-5:

Nor much better than what Davis and I found last December:

They all stink.
§
To be sporting, I gave Nano a second try on a different challenge, riffing on another example Davis and I had tried last year (“Draw a picture of a rabbit with four ears”), but slightly revising the details to avoid any problem-specific training.

ChatGPT, by the way, also continues to struggle with the basic facts of life:

§
Scaling works, but only to a degree, and only on some aspects of intelligence and not others. Graphics in these systems do continue to get better, regularly. And now they can generate videos, and not just stills.
But it’s all still just an extended form of mimicry, not something deeper, as the persistent failures with parts and wholes keep showing us.
Beauty is only skin deep.
Gary Marcus felt chuffed this morning to read this from
, “For …. Marcus [who has] … endured what he diplomatically calls “an unbelievable amount of shit” for his contrarian views … the irony is particularly delicious. He now finds himself vindicated as the very company he’s criticized adopts his language of caution and scaled-back expectation”. You can hear us discussing the weird sociology of contemporary AI here.