A year later, OpenAI demonstrated GPT-2, built by feeding a very large language model massive vast amounts of text from the web. This requires a huge amount of computer power, costing millions of dollars, by some estimates, and considerable engineering skill, but it seems to unlock a new level of understanding in the machine. GPT-2 and its successor GPT-3 can often generate paragraphs of coherent text on a given subject.
“What’s surprising about these large language models is how much they know about how the world works simply from reading all the stuff that they can find,” says Chris Manning, a professor at Stanford who specializes in AI and language.
But GPT and its ilk are essentially very talented statistical parrots. They learn how to re-create the patterns of words and grammar that are found in language. That means they can blurt out nonsense, wildly inaccurate facts, and hateful language scraped from the darker corners of the web.
Amnon Shashua, a professor of computer science at the Hebrew University of Jerusalem, is the cofounder of another startup building an AI model based on this approach. He knows a thing or two about commercializing AI, having sold his last company, Mobileye, which pioneered using AI to help cars spot things on the road, to Intel in 2017 for $15.3 billion.
Shashua’s new company, AI21 Labs, which came out of stealth last week, has developed an AI algorithm, called Jurassic-1, that demonstrates striking language skills in both English and Hebrew.
In demos, Jurassic-1 can generate paragraphs of text on a given subject, dream up catchy headlines for blog posts, write simple bits of computer code, and more. Shashua says the model is more sophisticated than GPT-3, and he believes that future versions of Jurassic may be able to build a kind of common-sense understanding of the world from the information it gathers.
Other efforts to re-create GPT-3 reflect the world’s—and the internet’s—diversity of languages. In April, researchers at Huawei, the Chinese tech giant, published details of a GPT-like Chinese language model called PanGu-alpha (written as PanGu-α). In May, Naver, a South Korean search giant, said it had developed its own language model, called HyperCLOVA, that “speaks” Korean.
Jie Tang, a professor at Tsinghua University, leads a team at the Beijing Academy of Artificial Intelligence that developed another Chinese language model called Wudao (meaning “enlightenment”) with help from government and industry.
The Wudao model is considerably larger than any other, meaning that its simulated neural network is spread across more cloud computers. Increasing the size of the neural network was key to making GPT-2 and -3 more capable. Wudao can also work with both images and text, and Tang has founded a company to commercialize it. “We believe that this can be a cornerstone of all AI,” Tang says.
Such enthusiasm seems warranted by the capabilities of these new AI programs, but the race to commercialize such language models may also move more quickly than efforts to add guardrails or limit misuses.
Perhaps the most pressing worry about AI language models is how they might be misused. Because the models can churn out convincing text on a subject, some people worry that they could easily be used to generate bogus reviews, spam, or fake news.
“I would be surprised if disinformation operators don’t at least invest serious energy experimenting with these models,” says Micah Musser, a research analyst at Georgetown University who has studied the potential for language models to spread misinformation.