Machine Learning and Robotics

12 minutes, 13 links


Updated November 2, 2022

You’re reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.

Will robots inherit the earth? Yes, but they will be our children.Marvin Minsky*

Robots cannot yet operate reliably in people’s homes and labs nor manipulate and pick up objects.* If we are to have robots in our day-to-day lives, it is essential to create robots that can robustly detect, localize, handle and move, and change the environment the way we want. We need robots that can pick up coffee cups, serve us, peel bananas, or even walk around without tripping or hitting walls. The problem is that human surroundings are complex, and robots today cannot pick up most objects. If you ask a robot to pick up something it has never seen before, it almost always fails. To accomplish that goal, it must solve several difficult problems.

For example, if you ask a robot to pick up a ruler, the robot first needs to determine which object is a ruler, where it is, and finally, calculate where to put its gripper based on that information. Or, if you want a robot to pick up a cup of coffee, the robot must decide where to pick it up. If the gripper picks up the cup from the bottom edge, it might tip over and spill. So, robots need to pick up different objects from different locations.

Amazon Picking Challenge

The ultimate challenge for Amazon is to build robots to do all the picking, packing, and shipping in their warehouses,* and they are not resting in reaching that ambitious goal.* So, Amazon created the annual Amazon Picking Challenge, held from 2015 to 2017, for teams from around the world to make robots excel at picking up objects. This was the go-to competition for picking up and handling different objects. Amazon chose the items, and teams spent months optimizing their robots for the task. Unfortunately, none of the programmed robots could handle any object outside the original parameters, meaning that the robots were overtrained and incapable of learning outside the training data.

Figure: The robot that Stefanie Tellex programmed to pick up objects.

In Amazon’s challenge, each team created a robot with cameras and processors, which had to pick up items and place them in a specified location. The teams competed in front of an audience for prize money. One competition included 25 items for the robots to retrieve, including a rubber duck, a bag of balls, and a box of Oreo cookies. The teams had 20 minutes to fetch and package as many items as possible from an Amazon shelving unit.*

Some teams used claws, but most used a suction system attached to a vacuum. The challenge lies in the fact that the human hand has 27 degrees of freedom, and our brain recognizes numerous objects. But each year, teams performed better and better. At some point, instead of humans doing the tedious work of picking, packing, and shipping packages, robots will do it 24/7, delivering packages cheaper and faster.

Stefanie Tellex

To solve the problem of handling different objects, one approach was to make them identifiable to the robot by adding QR codes on top of the objects so that the robot knows exactly which object it is and how to handle it. This approach, however, does not work in the real world.

Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Now Available

Figure: Professor Stefanie Tellex.*

To solve this problem, Professor Stefanie Tellex at Brown University works on a completely different approach. She makes robots learn on their own how to manipulate new objects by automating the process of learning how to pick them up. Robots need to learn how to pick up items just like humans do so that they do not need to study an object before they can pick it up. In other words, robots need to learn how to pick up new objects with high precision and high sensitivity (or recall).

Tellex built a system for robots that allows giving a robot a new object. To do that, she created a light view perception, which transforms the image it captures from its camera to a projection of the object, allowing the robot to pick it up. The system creates a synthetic camera using software to render an orthographic projection of the item. Tellex’s robot moves its arm above and around the object, taking multiple pictures with its camera and measuring its depth with its infrared sensor. By combining different representations, the robot can create an image of the object from any possible angle. The idea behind the technique is that the robot not only figures out the intensity of the light coming to the camera but also the direction of individual rays. This makes it possible to build a 3D model of the object and the scene. The robot is then able to detect and localize the object within two millimeters, which is the limitation of a camera.

Then, it tries different grasps to lift the item. Once gripped, the robot plays with the object to learn more about it and shakes it to make sure the grip is secure. When successful, the robot has learned to pick up a new item. After this learning experience, it can robustly manipulate this object. Not only that, but the robot can perform this learning process over and over again with different objects. Together with the light view perception system, Tellex’s group uses reinforcement learning to train robots to pick up unfamiliar objects even when lighting conditions are challenging. The robot learns by trying different grips and reinforcing behavior that seems to produce positive results. This allows Tellex’s robot to pick up objects in normally challenging situations, like grabbing a fork from a sink with running water, which would be extremely tricky to program manually. But all of this robotics development would not happen without training data or an operating system.

Datasets for Robotics

For robots to not have to manipulate and learn how to grip a new object every time it sees one, Tellex created a database of objects that robots would typically grasp.* She created a Million Object Challenge that accelerated the field by collecting and sharing data of these objects. People do not usually take pictures of door handles but instead take photos of more interesting things or selfies, so Tellex had to create a specific dataset for her needs.

Think of this as ImageNet for robots. The idea is that a robot can learn from this huge database, and when the robot is in a new environment, it will already know how to grip every object based on the data gathered by other robots. Tellex and her group have already collected data for around 200 items, including a plastic boat and a rubber duck, and other scientists can contribute their robot’s data to the project. Tellex’s goal is to build a library with one million different objects so that eventually, robots can identify any object in front of them and pick it up.

Researchers from Princeton and Stanford University, led by PhD student Angela Dai and Professor Thomas Funkhouser, created a dataset, ScanNet, that includes 3D views of thousands of scenes and millions of annotated objects like couches and lamps.* They created this dataset by scanning around 1,500 scenes using an iPad with an infrared depth sensor like the Microsoft Kinect. The resulting dataset is one order of magnitude larger than the second biggest dataset. Google’s AI research laboratory already uses this dataset to train its robots in simulations so that they can learn how to pick objects out of a bucket. ScanNet is extremely important for deep learning algorithms.

At the University of California, Berkeley, researchers also built a dataset comprising more than 1,000 objects with information of their 3D shape, visual appearance, and the physics of grasping them.* With such a dataset, the same researchers built robots that can pick up and shake objects in mid-air without dropping them 98% of the time. This is a much higher success rate compared to previous attempts. The results were, in part, because they trained the software for the robot in a 3D simulation before using it. The simulation-trained models are then successfully used in the physical world.

Deep Learning and Robotics

When performing tasks, robots still look robotic and their actions clunky because they follow a sense-plan-act paradigm.* That means that for every moment the robot interacts with the world, it must observe the world, create a model of what it senses, form a plan based on that, and then execute it. The old approach solved this problem modularly and tended not to work in cluttered environments, which are very natural in the real world. Perception is often imprecise, and so the models are often wrong and need to change.

To solve this problem and make robots move faster and be more reflexive, Google uses deep learning for its models, training the neural networks using a reinforcement learning algorithm, so that robots can act quickly. Google first trained their robots to imitate human behavior by observing human demonstrations of the intended action.* They built a robot that, for example, could pour from a cup after less than 15 minutes of observing humans performing this task from different viewpoints.

Google is also working with robot arms to make them learn how to grasp. They created a reinforcement learning algorithm that teaches a deep learning model used in the robot to learn how to grip objects.* It used seven robots with one experiment running a total of 800 total robot hours over the course of four months. With the information at hand, Google trained simulations of the robot with 10 GPUs and many CPUs, processing around 580,000 attempts.*

The learned model gave the robot arms a 96% success rate in 700 trials on previously unseen objects. Google showed that with deep learning and reinforcement learning, it is possible to train robots to grasp unknown objects successfully. Google is not the only company building deep learning models for robots; other research institutes, like OpenAI, have also done it successfully.*

Self-Driving Cars

Whether you think you can, or you think you can’t, you’re right.*

Many companies currently build technology for autonomous cars, and others are just entering the field. The three most transformative players in the space: Tesla, Google’s Waymo, and George Hotz’s Each of these companies tackles the problem with very different approaches. In some ways, self-driving cars are robots that require solving both hardware and software problems. A self-driving car needs to identify its surrounding environment with cameras, radar, or other instruments. Its software needs to understand what is around the car, know its physical location, and plan the next steps it needs to take to reach its destination.


You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!