Robot learns to cook on YouTube

Signal of change / Robot learns to cook on YouTube

By Naomi White / 21 Jan 2015

Scientists at the University of Maryland and NICTA, Australia, are working on ways for robots to learn how to cook by watching YouTube videos. The research team are developing a system that would allow the robot to learn for itself. The trick is to find a way for robots to learn how to study human actions, and then translate these observations into commands that are within the ability of the machine to duplicate.

Unlike special videos made in a lab to support an experiment, those found on YouTube and other services are unpredictable – for example, they differ in background and lighting. Sophisticated image recognition is required, as well as techniques that will allow the robot to break down the observed actions to an abstract “atomic” level.

The scientists have achieve this using a pair of Convolutional Neural Network (CNN) based recognition modules. The CNN has an artificial neuron, which is a mathematical function that imitates living neurons. These artificial neurons are hooked together to form an artificial neural network. These networks act like the human visual system, using overlapping of neural connections to study images. This overlapping provides high resolution and the data from the image is very resistant to distortion as it’s translated from one form to another.

The Maryland/NICTA system has two CNN visual recognition modules – one looks at the hands of the cook in the video and works out what sort of grasp is being used. The other determines how the hand and the object it is holding are moving, and by breaking down the movements, and analysing them, the robot deduces how it can use the moves to complete its own tasks. 

The robot looks for one of six basic types of grasps and studies how they are used and change through time in the video sequence. It can then decide which manipulator to choose to replicate the grasp. In addition to this, it identifies the object being grasped, such as a knife, an apple, a bowl, a salmon, a pot, amongst other things.

The next step is to determine which one of the ten common cooking scenarios, such as cutting, pouring, spreading, chopping, or peeling, is being carried out. That done, the system then identifies a much larger group of actions that make up the scenario, breaks them down and then determines how to duplicate them in a useful sequence called a “sentence”. In this way, it can turn jobs into manageable actions that the robot can perform.

So what?

The researchers say that in future they will work on refining the classifications and look at how to use the identified grasps to predict actions for a more detailed analysis as they work out the "grammar" of actions. As robots develop the ability to ‘watch and learn’, their potential to perform complex tasks will also grow, with implications for jobs, services and domestic life.


Gizmag (5 January 2015). Robot learns to cook by watching Youtube. 

IFL Science (5 January 2015). Robot learns to cook watching Youtube videos

Business 2 Community (6 January 2015). Cooking robot learns from watching Youtube videos. 

What might the implications of this be? What related signals of change have you seen?

Please register or log in to comment.