The generative AIs to know from GPT-3 to VALL-E

There’s a new AI on the block, and it can mimic someone’s voice from just a short audio clip of them speaking. If it sounds like there are a lot of wacky AIs out there right now that can generate things, including both images and words, you’re right! And because it can get confusing, we wrote you a quick guide. Here are some of the most prominent AIs to surface over the past 12 months.

VALL-E

The latest entrant, VALL-E is a new AI from Microsoft researchers that can generate a full model of someone’s voice from a three-second seed clip. It was trained on over 60,000 hours of English language speech from more than 7,000 speakers and works by turning the contents of the seed clip into discrete components through a process called tokenization, which breaks down texts into smaller units called tokens. The AI’s neural network then speculates what the other tokens required to make a full model would sound like, based off the few it has from the short clip. The results—which you can check out on the VALL-E website—are pretty astounding.

Because of the obvious deep fake uses for an AI model like VALL-E, Microsoft hasn’t released it to the public. (Microsoft has previously invested in DALL-E and ChatGPT-owner OpenAI and is also reportedly in talks to invest billions more.) Still, it shows the kind of things these generative AIs are capable of with even the smallest seed.

DALL-E 2

OpenAI’s DALL-E 2 arguably kicked off the latest AI craze when it was announced last April. It can create original images from a text prompt, whether you want something realistic or totally out there. It can even expand the boundaries of existing artwork with a technique called outpainting.

The best thing about DALL-E 2 is that its free for anyone to try. In your first month, you get 50 credits which each allow you to generate four image variations from a single text prompt. After that, you get 15 free credits per month.

Stable Diffusion

While OpenAI control access to DALL-E 2, Stability AI took a different approach with its image generator, Stable Diffusion: it made it open source. Anyone can download Stable Diffusion and create incredibly realistic looking images and imaginative artworks using a reasonably powerful laptop.

Because it’s open source, other companies have also been able to use Stable Diffusion to launch generative AI tools. The biggest name here is Lensa’s Magic Avatars. With the smartphone app, you are able to upload 10 to 20 photos which are used to train a custom Stable Diffusion model and then generate dozens of off-beat artistic avatars.

Midjourney

The other big name in image generation, Midjourney, is still in Beta and only accessible through a Discord channel. Its algorithm has improved a lot over the past year. Personally, I find the images created by its current model—Version 4—the most compelling and naturalistic, compared to other popular image generators. Unfortunately, accessing it through Discord is a weird hurdle, especially when compared to Stable Diffusion or DALL-E 2.

GPT-3

OpenAI’s Generative Pre-trained Transformer 3 or GPT-3 language model was actually released in 2020, but it has made headlines in the past couple of months with the release of ChatGPT, a chatbot that anyone can use. Its answers to a variety of questions and prompts are often accurate and, in many cases, indistinguishable from something written by a human. It’s started serious conversations about how colleges will detect plagiarism going forward (maybe with an AI-finding AI). Plus, it can write funny poems.

While ChatGPT is by far the most obvious instance of GPT-3 out in the world, it also powers other AI tools. Of all the generative AIs on the list, at PopSci we suspect it’s the one you will hear a lot more about in the next while.

Codex

OpenAI’s GPT-3 isn’t just good at generating silly songs and short essays; it also has the capacity to help programmers write code. The model called Codex is able to generate code in a dozen languages, including JavaScript and Python, from natural language prompts. On the demo page, you can see a short video of a browser game being made without a single line of code being written. It’s pretty impressive!And Codex is already out in the wild: GitHub Copilot uses it to automatically suggest full chunks of code. It’s like autocomplete on steroids.

A guide to the internet’s favorite generative AIs

VALL-E

DALL-E 2

Stable Diffusion

Midjourney

GPT-3

Codex

AI can spot tuberculosis early by listening to your cough AI can spot tuberculosis early by listening to your cough

Cops are using AI software to write police reports Cops are using AI software to write police reports

The Dall-E Mini image generator’s ridiculousness might be its main appeal The Dall-E Mini image generator’s ridiculousness might be its main appeal

Researchers used AI to explain complex science. Results were mixed. Researchers used AI to explain complex science. Results were mixed.

STANK LOVE, BEAR WIG, and other sayings from AI-generated candy hearts STANK LOVE, BEAR WIG, and other sayings from AI-generated candy hearts

MIT’s new computer chip design lets you clip on parts like LEGOs MIT’s new computer chip design lets you clip on parts like LEGOs

A modern AI-powered Mayflower just crossed the Atlantic A modern AI-powered Mayflower just crossed the Atlantic

What robots can and can’t do for a restaurant What robots can and can’t do for a restaurant

Artificial intelligence could help night vision cameras see color in the dark Artificial intelligence could help night vision cameras see color in the dark

AI confirms the obvious: The pandemic bummed people out AI confirms the obvious: The pandemic bummed people out

Bombarding period trackers with fake data won’t solve post-Roe privacy problems Bombarding period trackers with fake data won’t solve post-Roe privacy problems

TITAN will help the military make sense of all the data its sensors ingest TITAN will help the military make sense of all the data its sensors ingest

Canceling Prime just got easier for Amazon customers in the EU Canceling Prime just got easier for Amazon customers in the EU

How a two-person sub and a repurposed Navy ship discovered the deepest shipwreck yet How a two-person sub and a repurposed Navy ship discovered the deepest shipwreck yet

A beginner’s guide to choosing the perfect van-life vehicle A beginner’s guide to choosing the perfect van-life vehicle

The DOT is investing $1 billion to address historic transportation inequities The DOT is investing $1 billion to address historic transportation inequities

Google Hangouts is dead. Long live Google’s mess of chat apps. Google Hangouts is dead. Long live Google’s mess of chat apps.

‘Adopting typos’ and other ways to edit Wikipedia ‘Adopting typos’ and other ways to edit Wikipedia

Share

VALL-E

DALL-E 2

Stable Diffusion

Midjourney

GPT-3

Codex