Robots Learning New Tasks from YouTube Videos

By watching a person make a mixed drink, a robot trained on NVIDIA GPUs at the University of Maryland will then copy those actions to pour the right quantities into your drink. The robot a two-armed industrial machine watched a person mix a drink by pouring liquid from several bottles into a jug, and would then copy those actions, grasping bottles in the correct order before pouring the right quantities into the jug.

“We call it a ‘robot training academy,’” says Yezhou Yang, a graduate student in the Autonomy, Robotics and Cognition Lab at the University of Maryland. “We ask an expert to show the robot a task, and let the robot figure out most parts of sequences of things it needs to do, and then fine-tune things to make it work.”

The approach involves training a computer system to associate specific robot actions with video footage showing people performing various tasks. A recent paper from the group, for example, shows that a robot can learn how to pick up different objects using two different systems by watching thousands of instructional YouTube videos. One system learns to recognize different objects; another identifies different types of grasp.

Two Tesla K40s and two TITAN GPUs were used for the cocktail-making application and then two additional TITAN GPUs for neural network training and scene reconstruction.

The researchers are talking to several manufacturing companies, including an electronics business and a carmaker, about adapting the technology for use in factories.

Read more on MIT Technology Review >>