The generative AIs to know from GPT-3 to VALL-E

There’s a new AI on the block, and it can mimic someone’s voice from just a short audio clip of them speaking. If it sounds like there are a lot of wacky AIs out there right now that can generate things, including both images and words, you’re right! And because it can get confusing, we wrote you a quick guide. Here are some of the most prominent AIs to surface over the past 12 months.

VALL-E

The latest entrant, VALL-E is a new AI from Microsoft researchers that can generate a full model of someone’s voice from a three-second seed clip. It was trained on over 60,000 hours of English language speech from more than 7,000 speakers and works by turning the contents of the seed clip into discrete components through a process called tokenization, which breaks down texts into smaller units called tokens. The AI’s neural network then speculates what the other tokens required to make a full model would sound like, based off the few it has from the short clip. The results—which you can check out on the VALL-E website—are pretty astounding.

Because of the obvious deep fake uses for an AI model like VALL-E, Microsoft hasn’t released it to the public. (Microsoft has previously invested in DALL-E and ChatGPT-owner OpenAI and is also reportedly in talks to invest billions more.) Still, it shows the kind of things these generative AIs are capable of with even the smallest seed.

DALL-E 2

OpenAI’s DALL-E 2 arguably kicked off the latest AI craze when it was announced last April. It can create original images from a text prompt, whether you want something realistic or totally out there. It can even expand the boundaries of existing artwork with a technique called outpainting.

The best thing about DALL-E 2 is that its free for anyone to try. In your first month, you get 50 credits which each allow you to generate four image variations from a single text prompt. After that, you get 15 free credits per month.

Stable Diffusion

While OpenAI control access to DALL-E 2, Stability AI took a different approach with its image generator, Stable Diffusion: it made it open source. Anyone can download Stable Diffusion and create incredibly realistic looking images and imaginative artworks using a reasonably powerful laptop.

Because it’s open source, other companies have also been able to use Stable Diffusion to launch generative AI tools. The biggest name here is Lensa’s Magic Avatars. With the smartphone app, you are able to upload 10 to 20 photos which are used to train a custom Stable Diffusion model and then generate dozens of off-beat artistic avatars.

Midjourney

The other big name in image generation, Midjourney, is still in Beta and only accessible through a Discord channel. Its algorithm has improved a lot over the past year. Personally, I find the images created by its current model—Version 4—the most compelling and naturalistic, compared to other popular image generators. Unfortunately, accessing it through Discord is a weird hurdle, especially when compared to Stable Diffusion or DALL-E 2.

GPT-3

OpenAI’s Generative Pre-trained Transformer 3 or GPT-3 language model was actually released in 2020, but it has made headlines in the past couple of months with the release of ChatGPT, a chatbot that anyone can use. Its answers to a variety of questions and prompts are often accurate and, in many cases, indistinguishable from something written by a human. It’s started serious conversations about how colleges will detect plagiarism going forward (maybe with an AI-finding AI). Plus, it can write funny poems.

While ChatGPT is by far the most obvious instance of GPT-3 out in the world, it also powers other AI tools. Of all the generative AIs on the list, at PopSci we suspect it’s the one you will hear a lot more about in the next while.

Codex

OpenAI’s GPT-3 isn’t just good at generating silly songs and short essays; it also has the capacity to help programmers write code. The model called Codex is able to generate code in a dozen languages, including JavaScript and Python, from natural language prompts. On the demo page, you can see a short video of a browser game being made without a single line of code being written. It’s pretty impressive!And Codex is already out in the wild: GitHub Copilot uses it to automatically suggest full chunks of code. It’s like autocomplete on steroids.

VALL-E

DALL-E 2

Stable Diffusion

Midjourney

GPT-3

Codex

Win the Holidays with PopSci's Gift Guides

Here’s how your Paul McCartney wannabe can learn how to play the guitar Here’s how your Paul McCartney wannabe can learn how to play the guitar

This discounted e-scooter is perfect for anyone who loves shortcuts (and hates parking) This discounted e-scooter is perfect for anyone who loves shortcuts (and hates parking)

STANK LOVE, BEAR WIG, and other sayings from AI-generated candy hearts STANK LOVE, BEAR WIG, and other sayings from AI-generated candy hearts

The newest Roomba is finally smart enough to avoid pet poop The newest Roomba is finally smart enough to avoid pet poop

Singapore’s new robot cops will focus on small-time crime Singapore’s new robot cops will focus on small-time crime

The government is going to use facial recognition more. That’s bad. The government is going to use facial recognition more. That’s bad.

How do you make AI trustworthy? Here’s the Pentagon’s plan. How do you make AI trustworthy? Here’s the Pentagon’s plan.

A Texas town approved an AI-powered sentry tower for border security A Texas town approved an AI-powered sentry tower for border security

Alaska Airlines is using artificial intelligence to craft flight plans that save fuel—and time Alaska Airlines is using artificial intelligence to craft flight plans that save fuel—and time

This company is retrofitting airplanes to fly on missions with no pilots This company is retrofitting airplanes to fly on missions with no pilots

The simplest way to post to all your social media accounts at once The simplest way to post to all your social media accounts at once

Honda will soon start patrolling Ohio highways to improve the roads Honda will soon start patrolling Ohio highways to improve the roads

The trick to a more powerful computer chip? Going vertical. The trick to a more powerful computer chip? Going vertical.

This gadget is helping French special forces nail their parachute landings This gadget is helping French special forces nail their parachute landings

Turn back the clock with Instagram’s On This Day feature Turn back the clock with Instagram’s On This Day feature

What to know about the latest cybersecurity bug in log4j What to know about the latest cybersecurity bug in log4j

How to buy smart—and secure—gadgets How to buy smart—and secure—gadgets

Toyota’s GR Yaris experiments with a hydrogen combustion engine Toyota’s GR Yaris experiments with a hydrogen combustion engine

Share

VALL-E

DALL-E 2

Stable Diffusion

Midjourney

GPT-3

Codex

Win the Holidays with PopSci's Gift Guides