A guide to the internet’s favorite generative AIs
VALL-E is just the latest example. Here's what to know about DALL-E 2, GPT-3, and more.
There’s a new AI on the block, and it can mimic someone’s voice from just a short audio clip of them speaking. If it sounds like there are a lot of wacky AIs out there right now that can generate things, including both images and words, you’re right! And because it can get confusing, we wrote you a quick guide. Here are some of the most prominent AIs to surface over the past 12 months.
The latest entrant, VALL-E is a new AI from Microsoft researchers that can generate a full model of someone’s voice from a three-second seed clip. It was trained on over 60,000 hours of English language speech from more than 7,000 speakers and works by turning the contents of the seed clip into discrete components through a process called tokenization, which breaks down texts into smaller units called tokens. The AI’s neural network then speculates what the other tokens required to make a full model would sound like, based off the few it has from the short clip. The results—which you can check out on the VALL-E website—are pretty astounding.
Because of the obvious deep fake uses for an AI model like VALL-E, Microsoft hasn’t released it to the public. (Microsoft has previously invested in DALL-E and ChatGPT-owner OpenAI and is also reportedly in talks to invest billions more.) Still, it shows the kind of things these generative AIs are capable of with even the smallest seed.
OpenAI’s DALL-E 2 arguably kicked off the latest AI craze when it was announced last April. It can create original images from a text prompt, whether you want something realistic or totally out there. It can even expand the boundaries of existing artwork with a technique called outpainting.
The best thing about DALL-E 2 is that its free for anyone to try. In your first month, you get 50 credits which each allow you to generate four image variations from a single text prompt. After that, you get 15 free credits per month.
While OpenAI control access to DALL-E 2, Stability AI took a different approach with its image generator, Stable Diffusion: it made it open source. Anyone can download Stable Diffusion and create incredibly realistic looking images and imaginative artworks using a reasonably powerful laptop.
Because it’s open source, other companies have also been able to use Stable Diffusion to launch generative AI tools. The biggest name here is Lensa’s Magic Avatars. With the smartphone app, you are able to upload 10 to 20 photos which are used to train a custom Stable Diffusion model and then generate dozens of off-beat artistic avatars.
The other big name in image generation, Midjourney, is still in Beta and only accessible through a Discord channel. Its algorithm has improved a lot over the past year. Personally, I find the images created by its current model—Version 4—the most compelling and naturalistic, compared to other popular image generators. Unfortunately, accessing it through Discord is a weird hurdle, especially when compared to Stable Diffusion or DALL-E 2.
OpenAI’s Generative Pre-trained Transformer 3 or GPT-3 language model was actually released in 2020, but it has made headlines in the past couple of months with the release of ChatGPT, a chatbot that anyone can use. Its answers to a variety of questions and prompts are often accurate and, in many cases, indistinguishable from something written by a human. It’s started serious conversations about how colleges will detect plagiarism going forward (maybe with an AI-finding AI). Plus, it can write funny poems.
While ChatGPT is by far the most obvious instance of GPT-3 out in the world, it also powers other AI tools. Of all the generative AIs on the list, at PopSci we suspect it’s the one you will hear a lot more about in the next while.