An Exclusive Look Inside the Super-Secret AI Startup Viv
At the Microsoft Build developers’ conference in late March, CEO Satya Nadella stood on stage and proclaimed, “Bots are the … Continued
At the Microsoft Build developers’ conference in late March, CEO Satya Nadella stood on stage and proclaimed, “Bots are the new apps.” He was introducing the company’s new bot building platform, which he promised would “infuse intelligence about us and context into our computers.” Wittingly or no, Nadella was also setting the stage for one of Silicon Valley’s biggest battles in decades.
Within two months, Facebook and Google announced intelligent assistants, joining the ranks of Apple’s Siri, Amazon’s Alexa, and Microsoft’s Cortana (to which Nadella promised upgrades at Build). Just last week Apple announced plans to open the Siri SKD to third-party developers, and most assume that a significant Siri announcement—perhaps even an Amazon Echo competitor—is coming at Apple’s Worldwide Developers Conference later this month.
Launched in the midst of the fray was Viv, the highly anticipated AI assistant from one of Silicon Valley’s most secretive start-ups, also called Viv. At Tech Crunch Disrupt in early May, co-founder and CEO Dag Kittlaus debuted the app, using Viv’s natural language capabilities to do nifty things like check the weather and send a Venmo payment. At the Pioneers Festival in Europe last week, Viv co-founder Adam Cheyer—who, along with Kittlaus, built Siri before selling it to Apple—pushed Viv harder. One convoluted request from Cheyer: “Get me a window seat on a one-way nonstop flight from JFK to SFO three days after next Friday.” Viv pulled up options in less than a second.
The crowd was duly impressed. Even so, it’s hard not to dismiss Viv as a start up among tech giants. That would be a mistake.
Within Silicon Valley, natural language recognition is regarded as the next frontier of user interface. Just as the rise of browsers and search 25 years ago reshaped the flow of information and money across the Web, so too could artificial intelligence and natural language recognition. The thinking goes, if we can speak to machines as we would a friend, we would use them to do more than exchange information: We’d use them as proxies to act on that information.
In tech circles, some are already calling this model ‘conversation as a service.’ And while that is tech jargon run amok, it’s true that whoever controls that conversation will control the future Web, or at least a big chunk of it. So who will it be?
That very much remains to be seen. Much to the chagrin of most start-up founders, true paradigm shifts are rare in consumer technology. They occur when a company builds a user-friendly system or product flexible enough to match a broad range of demands to a diverse supply of products and services. See Amazon and Apple’s iPhone and App Store as exhibits one and two.
In the world of digital assistants, that sort of paradigm shift seems a ways off. The technology has reached utilitarian stage (What’s the weather today?) and novelty stage (Can you beat box for me?). But more often than not, users find themselves pressing up against an AI assistant’s limitations rather than discovering its possibilities. For example, Siri can help you make a dinner reservation. That’s great, but try getting her to change that reservation 30 minutes later from four diners to six, or from 7:45 to 8:15. Not a chance.
To understand why this is, consider most assistants as highly sophisticated phone trees. Someone on the product team identifies a potential use case for the assistant and then tasks a small army of coders with serving that use case in the best way. Repeat tens of thousands of times. That’s not a bad system, but it’s not a particularly scalable one either. People ask a lot of crazy stuff, and they ask it all sorts of crazy ways. How big an encyclopedia of tasks and commands can any one team build?
Which brings us to Viv. The company is pretty guarded about its technology and its beta partners, but it is clearly different than its competition. First, it does not use a phone-tree approach. Viv generates code dynamically. It uses natural language recognition to register user intent, and then it writes an executable command in response, tying together different integrated services as it goes. Also, it will remember those exchanges. Remember that dinner reservation? Viv will—and it will go back and change it if you like.
Second, Viv will be an open system. In theory, it will be the one assistant to work across different platforms, devices, and telecom services. This stands in contrast to Siri, Cortana, Alexa, and Facebook M, which for the most part act within their walled gardens. Google’s proposed assistant sounds a bit closer to Viv, in that it might cut across platforms and devices, but we’ll have to wait until later this year to see.
Third, Kittlaus says Viv will throw open its doors to third-party developers. Just as the App Store served as a catalyst for iPhone app development, Kittlaus says an open SDK could do the same for Viv. Granted, Amazon has already done a good job of this, while Facebook, Apple, and Google say they are planning to do the same thing. Still, Kittlaus says, Viv will promise developers even more freedom. Software or hardware developers will be able to integrate their services with Viv. But they’ll also be able to integrate Viv into their own products or services. It’s doubtful you’ll be talking to Siri in a lamppost anytime soon (that you’d want to is another question entirely).
For now, the company is still being pretty choosy about beta partners. They have about 50 right now—think payment, food, and flower delivery services, ticketing agencies, music services, automakers and so on. But Kittlaus is adamant that at launch, likely late this year or early next, Viv will be open to the multitudes.
When you add up those attributes, Viv has some clear advantages over other assistants. Kittlaus said to me, “We’ve spent three-and-a-half years working through problems the other guys don’t even know they have yet.” Yet no one, Kittlaus included, is blind to the significant hurdles that lie ahead.
Viv’s competition consists of the world’s largest technology companies (plus other formidable start ups, such as Hound). This should shake even the stoutest of Silicon Valley hearts. Also, those big companies already command something Viv desperately needs: audience. Without audience, developers won’t make cool new stuff for Viv. But without cool new stuff, no one will use Viv. The only way through that catch-22 is to make a product that’s orders of magnitude better than the competition. That’s very, very difficult. Then again, there are not a lot of Lycos or Alta Vista fans out there today, so hope springs eternal.
For perhaps those reasons, Viv has been extremely reticent to talk to the press. Kittlaus has only done a handful of interviews since founding Viv, and at the Pioneers Festival, he turned down every request but ours.
The conversation started guarded, but as time wore on, Kittlaus could not contain his enthusiasm for a product he sees as heads-and-shoulders above the competition. Before too long, he had his phone out – “I really shouldn’t be doing this but…” – and was running me through demos on the fly.
Seated in a loud and crowded room in Vienna (so 3G, not LTE), Kittlaus ran Viv through its paces. He kept things pretty basic: What was the weather like the day I was born? (Sweltering with a few thunderstorms, just as my mother said). Could it book a one-way flight from Vienna to SFO the next day (Yes, for $447 one-way, “Shall I purchase it for you?”). It only failed once: It could not retrieve the weather for another person’s birthday, probably because the records have not been filed.
The demos were cool—and they would have gotten better if we’d done things like bought the plane ticket only to change it a few moments later. But perhaps more important, Kittlaus opened up about his broader vision for Viv. It is grand. Kittlaus says he means to do much more than introduce yet another intelligent assistant into the world. He’d like Viv to become a new standard, like WiFi or USB. It could stand as a symbol for intelligent conversation with anything, and, if the stars align, it could redefine our relationship to machines in ways few have yet to imagine.
PS: Can you walk me through the core principles that guide Viv and, in your mind, set it apart from other assistants? DK: We have four guiding principles: one assistant, personalization, any device, and every service. Think of them as a set of rules that we we believe, in aggregate, will raise today’s kind of fun and semi-uselful and a little gimmicky assistants into something much more. One assistant just means that you want to have a single entity that knows you. That leads to personalization. Any device just means you won’t have to start over every time you talk to something new, which is nice. And every service means opening Viv up of this to allow third party developers create or teach Viv things. That last one is the single most important thing we’re doing to place the assistant on par with browsers and local apps.
You helped start Siri and you’ve got a deep understanding of how most assistants work. How does Viv pick up where others leave off? Every platform like Viv today basically has a product manager that figures out what the new feature should be. Then a developer codes – hard codes in most cases – a command, “If someone asks about this, do that.’ The problem is you can’t scale that to the world.
What Viv does is allow for third party developers to go in—almost like a Wikipedia for AI—and use a set of pretty simple tools to build whatever they want. That allows Viv to scale from the dozens of things that other assistants can do today to thousands, tens of thousands, to who-knows-how-many capabilities.
So that’s the big leap forward. Viv lets thousands of people from around the world teach it new things simultaneously.
You’ve mentioned that your secret sauce is Viv’s ability to write its own commands. In effect, it’s a program that writes itself. How will that make for a better experience? So the simple explanation is that in every other case, for any system like it, the individual developer needs to tell the machine step-by-step what to do. The standard is to teach an AI something and then train and train and train it to get better and better. You can’t do that when there are tens of thousands of services, particularly when many of them are working together in ways you’d never anticipated. Being able to recognize a user’s intent through natural speech and dynamically write commands in response to that is one of our big leaps forward. That capability paired with lots of services could transform what you use an assistant for, and the frequency with which you depend on it. That is the big thing.
*If that’s your big advantage, what’s the big disadvantage? And how do you think you can overcome it? * So we have a two-sided marketplace model and that’s a chicken or egg proposition. We need a user-base and we need developers to create some services. Also, we’re competing against companies that have a billion users already.
Already a lot of people have gotten involved even before we’ve developed a large user base. Now it’s up to us to get distribution going and create additional draw for developers. Many of them have lots of ideas and see potential business models. We are inundated – completely inundated – with interested parties.
You said you’re making a consumer version. Why? Yea, we are. Though Viv will hopefully be in a lot of places, we’re also building a consumer app that will be far beyond most of the other systems. The goal is to create a consumer benefit from day one. Then we just need to keep pressing on creating long-term incentives for developers to participate prior to having 100 million users lined up.
When will we be able to download the consumer version? When it’s ready…
You’ve said you’d like to see Viv like a new standard. What do you mean? Yea, I’d like to see Viv as a new utility, right there with WiFi and Bluetooth. It should be how you talk to the world.
What’s life in a Viv-centric world like? Vastly simplified interaction with everything. I was speaking to a guy last week who’s in his mid-sixties. He was asking me what I was doing. I was explaining it and he interrupted me. He said, ‘Oh, all this techie stuff—I just don’t get it.” I said, “The whole idea behind this is that you don’t have to get it anymore. All you have to do is talk to it.”
Language is this innate, natural interaction mode for all of us. The more things you talk to – in situations and scenarios that make sense – the simpler things get to operate or interact with. What we’re talking about is a world where you’d like to be able to talk to everything and have your assistant do everything.
In that perfect world, what could I talk to? Well, anything that makes sense. Anything that today has a complex interaction. You’ll be talking to your car; people are already starting to do that. In the US, we waste a billion hours of commute time each year. Imagine sitting in your car and you’re Christmas shopping, talking to your assistant about different presents. Imagine ordering food so it arrives just after you arrive back at your house. You’re also going to talk to talk to your mirror in the morning when you’re getting ready for work. They have the displays ready; they should be out soon. With Viv in your mirror, you could pull up headlines about topics you’re interested in. You could look at images. You know, ‘Pan that up on that one a little bit. Zoom in right there.’ I mean, who knows? There are medical applications, too. It’s all open territory.
What’s the craziest application someone has approach you with? I had a guy tell me today that the first thing he wants to do is create an application that would allow anyone to code programs with natural language, no actual coding necessary. That’s pretty far out.
Could Viv ever become a true companion? Without a doubt. I think that intelligent assistants – and hopefully Viv is the winner in this game – will be so commonplace that our kids will ask us how we ever got along with them. My kids ask me how I went to college without a laptop. You’re going to have your digital right hand man, so to speak, and it’s going to be so common for you to delegate the mundane tasks of your life that you can’t imagine life before it. That is the long-term goal. With Viv, we’re going to finish what we started.