Samsung’s new digital assistant, Bixby, tries to push past voice recognition toward true AI

The Galaxy S8 smartphone can see, listen, and learn.

Samsung's Bixby Virtual assistant
Bixby can be summoned by voice or with the press of a physical button on the new Galaxy S8 and Galaxy S8+. Samsung

Using voice commands with a smartphone is nothing new, but Samsung’s new digital assistant, Bixby, goes beyond voice recognition, to incorporate deep learning and expanded visual search to make it feel more like a real digital assistant living in your device.

Bixby draws immediate comparisons to Apple’s Siri, and Google’s Assistant, but while we often think about those as simply voices, Samsung describes Bixby has its own card-based visual interface to convey information. Voice is just one part of the equation. The other aspects are vision (like using the camera to scan a QR code, find out the cost of a book based on its cover, or translate text), reminders, and recommendations. Bixby is the umbrella term for those four smart functions.

Sriram Thodla, a senior director at Samsung focusing on intelligence and the internet of things, introduced Bixby to the public during the Galaxy S8 and S8+ announcement event on Wednesday. “Bixby understands context,” he said. “It knows what’s happening on your screen.”

For example, you can ask it to take a screenshot of what you’re doing, then send that image to a contact. This kind of complex request spanning multiple apps and services has proved problematic for digital assistants in the past.

Galaxy S8 and S8 Plus
Samsung’s new flagship phones, the Galaxy S8 and S8 Plus. The Bixby button is on the left side of the device. Samsung

“We say Bixby is an intelligent user interface,” Mok Oh, a vice president for services strategy at Samsung, said in an interview at a press event on Monday.

Oh touted Bixby’s completeness, meaning that if an app is Bixby-enabled, anything you can do with touch could also be done through voice. For example, you could ask Bixby to switch the display language on your phone to another language, and Bixby should make it so. The assistant is also “cognitively forgiving,” Oh said, so it should cope with ambiguity in requests.

Oh went on to highlight the phone’s photo app, called Gallery, and the thousands of different combinations of tasks a user could do within it. There are countless varied ways a user could command an image be cropped or edited, and Bixby should be able handle that.

“In many ways we apply deep learning technology,” to Bixby, Oh said. One aspect of that is that Bixby will give users a thumbs-up or thumbs-down option after it has handled a request, to let Bixby know how it did, and help it learn. “Actually, we apply learning in many, many different aspects of our whole technology stack for this,” he added.

That thumbs-up or thumbs-down function is critical for virtual agents like Bixby, Alex Rudnicky, a research professor of computer science at Carnegie Mellon University who focuses on speech, said. “You need some kind of a reinforcement that basically allows the system to learn—basically understand the connection between what the user wants, and what actually happens,” he said. “Realistically, the agent’s going to make a lot of mistakes.”

Amazon’s Alexa app has a similar function, asking the user if it did what they wanted.

In addition to its listening abilities, Bixby can also see into the real world. Using the S8’s built-in camera, Bixby can detect objects in a scene and search for information about that product as well as related products. Of course, it will also allow you to buy them from Samsung’s partners. This is a feature Siri doesn’t currently offer, and Google Assistant does, often to mixed results, but this type of augmented reality-style interaction is a logical step for AI as a personal assistant.

For the visual search, Samsung has tapped a variety of partner companies like Amazon for shopping, Foursquare for location-specific functions (Thodla used an example in which he took a picture of New York’s iconic Flatiron building and got information about it, as well as good food options in the area), and Google Translate for interpreting signs in different languages.

Samsung Bixby Home Screen
The Bixby home screen shows curated information from various apps depending on what it thinks will be most relevant at a given place or time. Samsung

Finally, Bixby should also pick up on situational patterns, Oh said; if you usually make a phone call when you’re driving home from work, the assistant might pick up on that and then recommend you do so. Thodla also touched on that point during the device’s unveiling, saying that Bixby might suggest an Uber if it notices you usually take one at a certain time.

Bixby has a home screen of its own that it curates with information it has learned about typical usage. Information from various apps is displayed on cards, and those it finds most relevant are pushed to the top. So, in the morning it prioritizes things like weather and email, while at night it might push social media to the top. All of this, however, is variable as Bixby gets feedback from the user.

In general, it is pattern recognition and situational awareness, in which a virtual assistant makes suggestions, that separates simple voice recognition and connectivity skills from artificial intelligence, according to Kris Bondi, the chief marketing officer at Neura. Neura makes an AI engine that focuses on personalization and identifying context and moments in a user’s life.