Yesterday, Google Research unveiled two new projects it’s been working on with a table tennis-playing robot. The Robotics team at Google taught a robot arm to play 300+ shot rallies with other people and return serves with the precision of “amateur humans.” While this might not sound that impressive given how bad some people are at table tennis, the same techniques could be used to train robots to perform other “dynamic, high acceleration tasks” that require close human-robot interaction.
Table tennis is an interesting task for robots to learn because of two complementary properties: It requires both fast and precise movements in a structured game that occurs in a fixed and predictable environment. The learning algorithm the robot relies on to make decisions has to work hard to get good, but the confines of a table tennis table limit how much of the world it has to contend with. It does help that playing table tennis is a task that requires two parties: the robot can play with another robot (or simulation) or an actual human to train. All this makes it a great set up for exploring human-robot interaction and reinforcement learning techniques (where the robot learns from doing).
Google engineers designed two separate projects using the same robot. Iterative-Sim2Real, which will be presented at CoRL later this year, and GoalsEye, which will be presented at IROS next week. Iterative-Sim2Real is the program that trained the robot to play 300-shot cooperative rallies with humans while GoalsEye allows it to return serves to a specific target point on the table with amateur human-like precision.
Iterative-Sim2Real is an attempt to overcome the “chicken and egg problem” of teaching machines to mimic human behaviors. The research team explains that if you don’t have a good robot policy (a set of rules for the robot) to begin with, then you can’t collect high-quality data on how people will interact with it. But, without a human behavior model to start with, you can’t come up with the robot policy in the first place. One alternative solution is to exclusively train robots in the real-world. However, this process is “often slow, cost-prohibitive, and poses safety-related challenges, which are further exacerbated when people are involved.” In other words, it takes a long time and people can get hurt by robot arms swinging table tennis bats around.
Iterative-Sim2Real sidesteps this problem by using a very simple model of human behavior as a starting point and then training the robot both with a simulation and a human in the real world. After each iteration, both the human behavior model and the robot policy are refined. Using five human subjects, the robot trained with Iterative-Sim2Real outperformed an alternative approach called sim-to-real plus fine-tuning. It had significantly fewer rallies that ended in less than five shots and its average rally length was 9 percent longer.
GoalsEye, on the other hand, set out to tackle a different set of training problems and taught the robot to return the ball to an arbitrary location such as “the back left corner” or “just over the net on the right side.” Imitation learning—where a robot develops a play strategy derived from human performance data—is hard to conduct in high-speed settings. There are so many variables affecting how a human hits a ping pong ball that makes tracking everything necessary for a robot to learn practically impossible. Reinforcement learning is typically good for these situations but can be slow and sample inefficient—especially at the start. (In other words, it takes a lot of repetitions to develop a fairly limited play strategy.)
GoalsEye attempts to overcome both sets of issues using an initial “small, weakly-structured, non-targeted data set” that enables the robot to learn the basics of what happens when it hits a ping pong ball and then allowing it to self-practice to teach it to hit the ball precisely to specific points. After being trained on the initial 2,480 demonstrations, the robot was able to return a ball to within 30 centimeters (~1 foot) only 9 percent of the time. But after self-practicing for ~13,500 shots, it was accurate 43 percent of the time.
While teaching robots to play games might seem trivial, the research team contends that solving these kinds of training problems with table tennis has potential real-world applications. Iterative-Sim2Real allows robots to learn from interacting with humans while GoalsEye shows how robots can learn from unstructured data and self-practice in a “precise, dynamic setting.” Worst case scenario: If Google’s big goals don’t pan out, at least they could build a robot table tennis coach.