Robotics

an hour, 70 links
From

editione1.0.2

Updated November 2, 2022

You’re reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.

Robots in the Industry

The Master created humans first as the lowest type, most easily formed. Gradually, he replaced them by robots, the next higher step, and finally he created me to take the place of the last humans.Isaac Asimov, I, Robot*

When people talk about artificial intelligence, they often think of mobile robots. But in computer science, AI is the field focused on the development of the brain of not only such robots but of computers that want to achieve certain goals. These robots do not use any of the deep learning models that we talked about previously. Instead, they have encoded, handwritten software.

Boston Dynamics

In Florida, a few people watch a competition between robots to reach a specific goal while achieving the different objectives faster and more precisely than their opponents. One robot looks at a door with its sensors—cameras and lasers—to decide what to do next in order to open it. Using its robotic arm, it slowly pushes the door and goes to the other side. The team responsible for the robot cheers as it completes one of the tasks.

This story might sound like science fiction or from a distant future, but the US Defense Advanced Research Projects Agency (DARPA) organized that competition, the DARPA Robotics Challenge (DRC), in December 2013. Boston Dynamics created the robot that opened the door, Atlas, but many other robots also attempted these tasks. And for each robot, the development teams that programmed them eagerly watched.* The DRC’s goal was for robots to perform independent jobs inspired by situations dangerous to humans, like a nuclear power plant failure. The competition tested the robots’ agility, sensing, and manipulation capabilities. Upon first glance, the work seems pretty straightforward, like walking over terrain and opening doors, but they are difficult for robots to achieve. The most challenging assignment was to walk over an uneven surface because it is hard for robots to stay balanced. Most of the robots in the competition failed and did not complete many of the tasks because they malfunctioned or the job was too hard. Atlas achieved the most tasks of any of the competitors.

DARPA program manager, Gill Pratt, said of the prototype, “A 1-year-old child can barely walk, a 1-year-old child falls down a lot, this is where we are right now.”* Boston Dynamics revealed Atlas on July 11, 2013. At the first public appearance, the New York Times stated, “A striking example of how computers are beginning to grow legs and move around in the physical world,” describing the robot as “a giant—though shaky—step toward the long-anticipated age of humanoid robots.”*

Boston Dynamics has the bold goal of making robots that are better than animals in mobility, dexterity, and perception. By building machines with dynamic movement and balance, their robots can go almost anywhere, on any terrain on Earth. They also want their robots to manipulate objects, hold them steady, and walk around without dropping them. And, they are approaching their goals as time progresses. Atlas continues to improve with lighter hardware, more capabilities, and improved software.

Figure: The second version of Atlas.

Atlas was much more advanced than the first robots from the 1960s like Stanford’s Shakey. But Boston Dynamics wanted to improve their robot, so they designed a second version—Atlas, The Next Generation. They first released a YouTube video of it in February 2016 during which it walked on snow. Subsequent videos showed Atlas doing a backflip and jumping over a dog lying in the grass.*

Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Now Available

To build this updated version, Boston Dynamics used 3D printing to make parts of the robot look more like an animal. For example, its upper leg, which has hydraulic pathways, actuators, and filters, are all embedded and printed as one piece. That was not possible before 3D printing. They designed the structure using the knowledge of Atlas’s loads and behaviors, based on data from previous interactions of the original Atlas robots with the environment. They also added software simulations. With the 3D-printing technique, Boston Dynamics transformed what was once a big, bulky, and slow robot weighing around 375 pounds into a much slimmer version at 165 pounds.*

Boston Dynamics is not only focused on building humanoid robots, but it is also developing different looking cyborgs as well. They have two robotic dogs, Spot and SpotMini.* Like Atlas, the dogs can enter areas unsafe for humans in order to clear out the space. Using cameras, the dogs look at the terrain, assess the elevation of the floor, and figure out where they can step and how to climb to another region.* These robotic machines continue to improve and become more agile and less clunky. The latest version dances to Bruno Mars’s hit song “Uptown Funk.” I believe this is only the beginning of the robotic revolution. Spot and other robots may end up in our everyday lives.

Kiva Systems

Giants like Amazon have been working on robots to increase their companies’ productivity. At an Amazon warehouse, small robots help packers for the online retail giant.* These automated machines cruise around the warehouse floor, delivering shelves full of items to humans, who then pick, pack, and ship the items without taking more than a couple of steps.

Figure: A Kiva robot in an Amazon warehouse.

This automation is a considerable change for Amazon, where humans used to select and pack items themselves with only the help of conveyor belts and forklifts. With the introduction of Kiva Systems’ robots, the Amazon warehouse processes completely changed. Now, humans stand in a set location, and robots move around the warehouse, alleviating most of the manual labor.

This change occurred when Amazon acquired Mick Mountz’s Kiva Systems for $775M in 2012.* After working years in business processes at Webvan, a now-defunct e-commerce startup, Mick realized that one of the reasons for the downfall was due to the high costs of order fulfillment.* In 2001, after the dot-com bubble exploded, the company filed for bankruptcy and later became part of Amazon. Mick found a better way to handle orders inside warehouses and started Kiva Systems with the help of robotics experts.

In a typical warehouse, humans fill orders by wandering through rows of shelves, often carrying portable radio-frequency scanners to locate products. Computer systems and conveyor belts sped things up but only to a point. With the help of robots, however, workers at Amazon process items three times faster and do not need to search for products. When an order comes into Amazon.com, a robot drives around a grid of shelves, locates the correct shelf, lifts the shelf onto its back, and delivers it to a human worker.* The person then completes the process by picking up the order, packing it, and shipping it. Humans do not get much rest, so to avoid human error, a red laser flashes on the item so that the human knows what to pick up. The robot, then, returns the shelf to the grid. As soon as the robot takes away the shelf, another one arrives so that the human is always working.

The Robot Operating System

To function, robots need an operating system that can distill high-level instructions down to the hardware. This requirement is the same as for standard computers which need to communicate with their hard drive and display. Robots need to pass information to their components, like arms, cameras, and wheels. In 2007, Scott Hassan, an early Google engineer who previously worked with Larry Page and Sergey Brin, started Willow Garage to advance robotics. The team developed the Robot Operating System (ROS) for its own robots, one of which was the Personal Robot 2 (PR2). Ultimately, they shared the open-source operating system with other companies before closing their doors in 2014.*

The PR2 had two strong arms that performed delicate tasks like turning a page in a book. It contained pressure sensors in the arms as well as stereo cameras, a light detection and ranging (LIDAR) sensor, and inertial measurement sensors.* These sensors provided data for the robot to navigate in complex environments. Willow Garage developed ROS to understand the signals from these sensors as well as to control them.

Figure: Personal Robot 2.

ROS included a middle layer, which communicated between the software written by developers and the hardware, as well as software for object recognition and many other tasks.* It provided a standard platform for programming different hardware and a growing array of packages that gave robots new capabilities. The platform included libraries and algorithms for vision, navigation, and manipulation, among other things.

ROS enabled hobbyists and researchers to more easily develop applications on top of hardware. With ROS, robots play instruments, control high-flying acrobatic machines, walk, and fold laundry.* Currently, ROS is under development by other hardware businesses like self-driving car companies. The newest version of the software, ROS 2.0, has many new capabilities including real-time control and the ability to manage multiple robots. As these systems improve, we may eventually have robots performing our house cleaning chores.

Machine Learning and Robotics

Will robots inherit the earth? Yes, but they will be our children.Marvin Minsky*

Robots cannot yet operate reliably in people’s homes and labs nor manipulate and pick up objects.* If we are to have robots in our day-to-day lives, it is essential to create robots that can robustly detect, localize, handle and move, and change the environment the way we want. We need robots that can pick up coffee cups, serve us, peel bananas, or even walk around without tripping or hitting walls. The problem is that human surroundings are complex, and robots today cannot pick up most objects. If you ask a robot to pick up something it has never seen before, it almost always fails. To accomplish that goal, it must solve several difficult problems.

For example, if you ask a robot to pick up a ruler, the robot first needs to determine which object is a ruler, where it is, and finally, calculate where to put its gripper based on that information. Or, if you want a robot to pick up a cup of coffee, the robot must decide where to pick it up. If the gripper picks up the cup from the bottom edge, it might tip over and spill. So, robots need to pick up different objects from different locations.

Amazon Picking Challenge

The ultimate challenge for Amazon is to build robots to do all the picking, packing, and shipping in their warehouses,* and they are not resting in reaching that ambitious goal.* So, Amazon created the annual Amazon Picking Challenge, held from 2015 to 2017, for teams from around the world to make robots excel at picking up objects. This was the go-to competition for picking up and handling different objects. Amazon chose the items, and teams spent months optimizing their robots for the task. Unfortunately, none of the programmed robots could handle any object outside the original parameters, meaning that the robots were overtrained and incapable of learning outside the training data.

Figure: The robot that Stefanie Tellex programmed to pick up objects.

In Amazon’s challenge, each team created a robot with cameras and processors, which had to pick up items and place them in a specified location. The teams competed in front of an audience for prize money. One competition included 25 items for the robots to retrieve, including a rubber duck, a bag of balls, and a box of Oreo cookies. The teams had 20 minutes to fetch and package as many items as possible from an Amazon shelving unit.*

Some teams used claws, but most used a suction system attached to a vacuum. The challenge lies in the fact that the human hand has 27 degrees of freedom, and our brain recognizes numerous objects. But each year, teams performed better and better. At some point, instead of humans doing the tedious work of picking, packing, and shipping packages, robots will do it 24/7, delivering packages cheaper and faster.

Stefanie Tellex

To solve the problem of handling different objects, one approach was to make them identifiable to the robot by adding QR codes on top of the objects so that the robot knows exactly which object it is and how to handle it. This approach, however, does not work in the real world.

Figure: Professor Stefanie Tellex.*

To solve this problem, Professor Stefanie Tellex at Brown University works on a completely different approach. She makes robots learn on their own how to manipulate new objects by automating the process of learning how to pick them up. Robots need to learn how to pick up items just like humans do so that they do not need to study an object before they can pick it up. In other words, robots need to learn how to pick up new objects with high precision and high sensitivity (or recall).

Tellex built a system for robots that allows giving a robot a new object. To do that, she created a light view perception, which transforms the image it captures from its camera to a projection of the object, allowing the robot to pick it up. The system creates a synthetic camera using software to render an orthographic projection of the item. Tellex’s robot moves its arm above and around the object, taking multiple pictures with its camera and measuring its depth with its infrared sensor. By combining different representations, the robot can create an image of the object from any possible angle. The idea behind the technique is that the robot not only figures out the intensity of the light coming to the camera but also the direction of individual rays. This makes it possible to build a 3D model of the object and the scene. The robot is then able to detect and localize the object within two millimeters, which is the limitation of a camera.

Then, it tries different grasps to lift the item. Once gripped, the robot plays with the object to learn more about it and shakes it to make sure the grip is secure. When successful, the robot has learned to pick up a new item. After this learning experience, it can robustly manipulate this object. Not only that, but the robot can perform this learning process over and over again with different objects. Together with the light view perception system, Tellex’s group uses reinforcement learning to train robots to pick up unfamiliar objects even when lighting conditions are challenging. The robot learns by trying different grips and reinforcing behavior that seems to produce positive results. This allows Tellex’s robot to pick up objects in normally challenging situations, like grabbing a fork from a sink with running water, which would be extremely tricky to program manually. But all of this robotics development would not happen without training data or an operating system.

Datasets for Robotics

For robots to not have to manipulate and learn how to grip a new object every time it sees one, Tellex created a database of objects that robots would typically grasp.* She created a Million Object Challenge that accelerated the field by collecting and sharing data of these objects. People do not usually take pictures of door handles but instead take photos of more interesting things or selfies, so Tellex had to create a specific dataset for her needs.

Think of this as ImageNet for robots. The idea is that a robot can learn from this huge database, and when the robot is in a new environment, it will already know how to grip every object based on the data gathered by other robots. Tellex and her group have already collected data for around 200 items, including a plastic boat and a rubber duck, and other scientists can contribute their robot’s data to the project. Tellex’s goal is to build a library with one million different objects so that eventually, robots can identify any object in front of them and pick it up.

Researchers from Princeton and Stanford University, led by PhD student Angela Dai and Professor Thomas Funkhouser, created a dataset, ScanNet, that includes 3D views of thousands of scenes and millions of annotated objects like couches and lamps.* They created this dataset by scanning around 1,500 scenes using an iPad with an infrared depth sensor like the Microsoft Kinect. The resulting dataset is one order of magnitude larger than the second biggest dataset. Google’s AI research laboratory already uses this dataset to train its robots in simulations so that they can learn how to pick objects out of a bucket. ScanNet is extremely important for deep learning algorithms.

At the University of California, Berkeley, researchers also built a dataset comprising more than 1,000 objects with information of their 3D shape, visual appearance, and the physics of grasping them.* With such a dataset, the same researchers built robots that can pick up and shake objects in mid-air without dropping them 98% of the time. This is a much higher success rate compared to previous attempts. The results were, in part, because they trained the software for the robot in a 3D simulation before using it. The simulation-trained models are then successfully used in the physical world.

Deep Learning and Robotics

When performing tasks, robots still look robotic and their actions clunky because they follow a sense-plan-act paradigm.* That means that for every moment the robot interacts with the world, it must observe the world, create a model of what it senses, form a plan based on that, and then execute it. The old approach solved this problem modularly and tended not to work in cluttered environments, which are very natural in the real world. Perception is often imprecise, and so the models are often wrong and need to change.

To solve this problem and make robots move faster and be more reflexive, Google uses deep learning for its models, training the neural networks using a reinforcement learning algorithm, so that robots can act quickly. Google first trained their robots to imitate human behavior by observing human demonstrations of the intended action.* They built a robot that, for example, could pour from a cup after less than 15 minutes of observing humans performing this task from different viewpoints.

Google is also working with robot arms to make them learn how to grasp. They created a reinforcement learning algorithm that teaches a deep learning model used in the robot to learn how to grip objects.* It used seven robots with one experiment running a total of 800 total robot hours over the course of four months. With the information at hand, Google trained simulations of the robot with 10 GPUs and many CPUs, processing around 580,000 attempts.*

The learned model gave the robot arms a 96% success rate in 700 trials on previously unseen objects. Google showed that with deep learning and reinforcement learning, it is possible to train robots to grasp unknown objects successfully. Google is not the only company building deep learning models for robots; other research institutes, like OpenAI, have also done it successfully.*

Self-Driving Cars

Whether you think you can, or you think you can’t, you’re right.*

Many companies currently build technology for autonomous cars, and others are just entering the field. The three most transformative players in the space: Tesla, Google’s Waymo, and George Hotz’s Comma.ai. Each of these companies tackles the problem with very different approaches. In some ways, self-driving cars are robots that require solving both hardware and software problems. A self-driving car needs to identify its surrounding environment with cameras, radar, or other instruments. Its software needs to understand what is around the car, know its physical location, and plan the next steps it needs to take to reach its destination.

Tesla

Tesla, founded by Martin Eberhard and Marc Tarpenning in 2003, is known as the Apple of cars because of its revolutionary car design and outside-the-box thinking when creating its vehicles.* Tesla develops its cars based on first principles, from the air conditioning system that uses perpendicular vents to how they form their chassis and suspension. With its innovation and work, the Tesla Model 3 is the safest car in the world,* followed by the Tesla Model S and Model X.* But Tesla is not only innovative with their hardware, it also invests heavily in its Autopilot technology.

In 2014, Tesla quietly installed several pieces of hardware to increase the safety of their vehicles—12 ultrasonic sensors, a forward-facing camera, a front radar, a GPS, and digitally controlled brakes.* A few months later, they released a technology package for an additional $4,250 to enable the use of the sensors. In a rapid release streak, Tesla launched features in the upcoming months, and a year later, rolled out its first version of the Autopilot—known as Tesla Version 7.0—to 60,000 cars.

Autopilot gave drivers features like steering within a lane, changing lanes, and automatic parking. Other companies, including Mercedes, BMW, and GM, already offered some of the capabilities, however. But self-steering was a giant leap toward autonomy that was released suddenly, overnight, as a software update. Tesla customers were delighted with the software update, releasing videos on the internet of the software “driving” their Teslas, hands-free.

Tesla not only makes the software but also the hardware for its cars, enabling it to release new features and update its software over the air (OTA). Because it has released cars that have the necessary hardware components for self-driving capability since 2014, Tesla has a widely distributed test fleet. Other car manufacturers, like Google and GM, only have a small fleet of cars with the required hardware for self-driving.

From the introduction of the Tesla hardware package until November 2018,* a total of 50 months, Tesla accrued around 1 billion miles driven with the newest hardware.* Not only that, but the Tesla servers store the data these cars accumulate so that the Autopilot team can make changes to its software based on what it learns. At the time of this writing, Tesla had collected around 5.5 million miles of data per day for its newest system, taking only around four hours to gather 1 million miles. For comparison, Waymo has the next most data with about 10 million miles driven in its lifetime. In two days, Tesla acquires more data from its cars than Waymo has in its lifetime.

This data collection rate increases with more cars on the streets, and Tesla has been speeding up their production pace. Even though Tesla has more miles accumulated than its competitors,* when it tested its self-driving capability with the California Department of Motor Vehicles (DMV)—the state government organization that regulates vehicle registration—Tesla had a much higher count of disengagements compared to other competitors.*

Disengagements are a metric that the average person can use to compare autonomous systems.* It provides a rough count of how often the car’s system fails so badly that the test driver takes over. It is only a proxy of the performance because this metric does not take into account variables that may affect the vehicle, like weather, or how and where these problems occurred. An increase in disengagement could mean that a major problem exists or that the company is testing its software in more challenging situations such as a city.

At the end of 2015, Tesla numbers showed that it was far behind its competitors. If we normalize the numbers of miles per disengagement, Tesla had 1,000 times worse software compared to Waymo. But Tesla continues to hone its system, year after year. And, Tesla has an advantage over other carmakers: It can update the system over the air and make it better without having to sell new cars or have existing ones serviced.

Figure: Comparing miles per disengagement.*

Waymo’s self-driving fleet has the lowest number of disengagements per mile, but even this metric does not yet approach human performance. Waymo has 1 disengagement per 1,000 miles. If we consider a human disengagement as being when a human is driving and there is an accident, then theoretically, humans have around 100 times fewer disengagements than Waymo’s self-driving software.

But Tesla has another advantage: It has a large fleet of cars enabled for testing its newest self-driving car software update. This technology enables Tesla to develop software in-house and release it in shadow mode for millions of miles before releasing the software to the public.* Shadow mode allows Tesla to silently test its algorithms in customers’ cars, which provides the company with an abundant testbed of real-world data.

Figure: Image courtesy of Velodyne LiDAR.*

LIDAR or light detection and ranging is a sensor similar to a radar—its name came from a portmanteau of light and radar.* LIDAR maps physical space by bouncing laser beams off objects. Radar cannot see much detail, and cameras do not perform as well in conditions of low light or glare.* LIDAR lets a car “see” what is around it with much more detail than other sensors. The problem with LIDAR is that it does not work well in several different lighting conditions, including when it is foggy, raining, or snowing.*

Unlike other companies, Tesla bets that they can run a self-driving car that performs better than a human without a LIDAR hardware device.

Another problem is that LIDAR is expensive, originally starting at around $75K, although the cost is now considerably less,* and the hardware is bulky, resembling KFC buckets.* LIDAR helps autonomous cars process and build a 3D model of the world around them, called simultaneous localization and mapping (SLAM). Still, Tesla continues to improve their software and lower their disengagement rate, which is one of the reasons Tesla bet on not using such a device. To perform as well as humans, cars need the same type of hardware. Humans drive only with their eyes. So, it makes sense that self-driving cars could perform as well as humans with cameras alone.

A Tesla vehicle running the Autopilot software ran into a tractor-trailer in June 2016 after its software could not detect the trailer against the bright sky, resulting in the death of its driver. According to some, LIDAR could have prevented that accident. Since then, Tesla added radars to its cars for these situations. One of the providers of the base software, Mobileye, parted ways with Tesla because of the fatality. They thought Tesla was too bullish when introducing its software to the masses and that it needed more testing to ensure safety for all. Unfortunately, fatalities with self-driving software will always occur, just as with human drivers. Over time, the technology will improve, and the disengagement rates will decrease. I predict a time when cars are better than humans at driving, at which point cars will be safer drivers than humans. But deaths will inevitably occur.

Before that fatality, Tesla used Mobileye software to detect cars, people, and other objects in the street. Because of the split, Tesla had to develop the Autopilot 2 package from scratch, meaning it built new software to recognize objects and act on them. It took Tesla two years to be in the same state as before the breakup. But once it caught up with the old system, it quickly moved past its initial features.

For example, the newest Tesla Autopilot software 9.0, has the largest vision neural network ever trained.* They based the neural network on Google’s famous vision neural network architecture Inception. Tesla’s version, however, is ten times larger than Inception and has five times the number of parameters (weights). I expect that Tesla will continue to push the envelope.

Waymo

Tesla is not the only self-driving company at the forefront of technology. In fact, Google’s Waymo was one of the first companies to start developing software for autonomous cars. Waymo is a continuation of a project started in a laboratory at Stanford 10 years before the first release of the Tesla Autopilot. It won the DARPA Grand Challenge for self-driving cars, and because of its notoriety, Google acquired it five years later, forming Waymo. Waymo’s cars perform much better than any other self-driving system, but what is surprising is that they have many fewer miles driven in the real world than Tesla and other self-driving car makers.*

The DARPA Grand Challenge began in 2004 with a 150-mile course through the desert to spur development of self-driving cars. During the first year, the winner, Waymo, completed seven of the miles, but every vehicle crashed, failed, or caught fire.* The technology required for these first-generation cars was sophisticated, expensive, bulky, and not visually attractive. But over time, the cars improved, needing less hardware. While the initial challenge was limited to a single location in the desert, it expanded to city courses in later years.

With Waymo as the first winner of the competition, they became the leader of the autonomous car sector. Having the lowest disengagement rate per mile of any self-driving car system means that they have the best software. Some argue that the primary reason for Waymo performing better than the competition is that it tests its software in a simulated world. Waymo, located in a corner of Alphabet’s campus, developed a simulated virtual world called Carcraft—a play on words referring to the popular game World of Warcraft.* Originally, this simulated world was developed to replay scenes that the car experienced on public roads, including the times when the car disengaged. Eventually, Carcraft took an even larger role in Waymo’s self-driving car software development because it simulated thousands of scenarios to probe the car’s capability.

Waymo used this virtual reality to test its software before releasing it to the real-world test cars. In the simulation, Waymo created fully modeled versions of cities like Austin, Mountain View, and Phoenix as well as other test track simulations. It tested different scenarios in many simulated cars—around 25,000 of these at any single time. Collectively, the cars drive about 8 million miles per day in this virtual world. In 2016 alone, the virtual autonomous cars logged approximately 2.5 billion virtual miles, much more than the 3 million miles Waymo’s cars drove on the public roads. Its simulated world has logged 1,000 times more miles than its actual cars have.

The power of these simulations is that they train and test the models with software created for interesting and difficult interactions instead of the car simply putting in miles. For example, Carcraft simulates traffic circles that have many lanes and are hard to navigate. It mimics when other vehicles cut off the simulated car or when a pedestrian unexpectedly crosses the street. These situations rarely happen in the real world, but when they do, they can be fatal. These reasons are why Waymo has a leg up on its competitors. It trains and tests its software in situations other competitors cannot without the simulated world, regardless of how many miles they log. Personally, I believe testing in the simulated world is essential for making a safe system that can perform better than humans.

The simulation makes the software development cycle much, much faster. For developers, the iteration cycle is extremely important. Instead of taking weeks like in the early days of Waymo’s software construction, the cycle changed to a matter of minutes after developing Carcraft, meaning engineers can tweak their code and test it quickly instead of waiting long periods of time for testing results.

Carcraft tweaks the software and makes it better, but the problem is that a simulation does not test situations where there are oil slicks on the road, sinkhole-sized potholes, or other weird anomalies that might be present in the real world but not part of the virtual world. To test that, Waymo created an actual test track that simulates the diverse scenarios that these cars can encounter.

As the software improves, Waymo downloads it to their cars and runs and tests it on the test track before uploading it to the cars in the real world. To put this into perspective, Waymo reduced the disengagement rate per mile by 75% from 2015 to 2016.* Even though Waymo had a head start in creating a simulated world for testing its software, many other automakers now have programs to create their own simulations and testbeds.

Some report that the strategy for Waymo is to build the operating system for self-driving cars. Google had the same strategy when building Android, the operating system for smartphones. They built the software stack for smartphones and let other companies, like Samsung and Motorola, build the hardware. For self-driving cars, Waymo is building the software stack and wants the carmakers to build the hardware. It reportedly tried to sell its software stack to automakers but was unsuccessful. Auto companies want to build their own self-driving systems. So, Waymo took matters into their own hands and developed an Early Rider taxi service with about 62,000 minivans.* In December 2018, Waymo One launched a 24-hour service in the Phoenix area that opened up its ride-sharing service to a few hundred preselected people, expanding its private taxi service. These vans, however, will have a Waymo employee in the driver’s seat. This might be the solution to run its self-driving cars in the real world at first, but it will be difficult to see that solution scale up.

Comma.ai

One of the other most important players in the self-driving ecosystem is Comma.ai, started by a hacker in his mid-twenties, George Hotz, in 2015.* In 2007, at the age of 17, he became famous for being the first person to hack the iPhone to use on networks other than AT&T. He was also the first person to hack the Sony PlayStation 3 in 2010. Before building a self-driving car, Hotz lived in Silicon Valley and worked for a few companies including Google, Facebook, and an AI startup called Vicarious.

Figure: George Hotz and his first self-driving car, an Acura.

Hotz started hacking self-driving cars by retrofitting a white 2016 Acura ILX with a LIDAR on the roof and a camera mounted near the rearview mirror. He added a large monitor where the dashboard sits and a wooden box with a joystick, where you typically find the gearshift, that enables the self-driving software to take over the car. It took him about a month to retrofit his Acura and develop the software needed for the car to drive itself. Hotz spent most of his time adding sensors, the computer, and electronics. Once the systems were up and running, he drove the car for two and a half hours to let the computer observe him driving. He returned home and downloaded the data so that the algorithm could analyze his driving patterns.

The software learned that Hotz tended to stay in the middle lane and maintained a safe distance from the car in front of it. Two weeks later, he went for a second drive to provide more hours of training and also to test the software. The car drove itself for long stretches while remaining within the lanes. The lines on the dash screen—one showed the car’s actual path and the other where the computer wanted to go—overlapped almost perfectly. Sometimes, the Acura seemed to lock onto the car in front of it or take cues from a nearby car. After automating the car’s steering as well as the gas and brake pedals, Hotz took the car for a third drive, and it stayed in the center of the lane perfectly for miles and miles, and when a car in front of it slowed, so did the Acura.

Figure: George Hotz’s self-driving car.

The technology he built as an entrepreneur represents a fundamental shift from the expensive systems designed by Google into much cheaper systems that depend on software more than hardware. His work impressed many technology companies including Tesla. Elon Musk, who joined Tesla after a Series A funding round and is their current CEO, and Holz met at Tesla’s Fremont, California, factory and discussed artificial intelligence. The two settled on a deal where Hotz would create software better than Mobileye’s, and Musk would compensate him with a contract worth about $1M per year. Unfortunately, Holz walked away after Musk continually changed the terms of the deal. “Frankly, I think you should just work at Tesla,” Musk wrote to Hotz in an email. “I’m happy to work out a multimillion-dollar bonus with a longer time horizon that pays out as soon as we discontinue Mobileye.” “I appreciate the offer,” Hotz replied, “but like I’ve said, I’m not looking for a job. I’ll ping you when I crush Mobileye.” Musk simply answered, “OK.”*

Since then, Holz has been working on what he calls the Android of self-driving cars, comparing Tesla to the iPhone of autonomous vehicles. He launched a smartphone-like device, which sells for $699 with software installed. The dash cam simply plugs into the most popular cars made in the United States after 2012 and provides the equivalent capability of Tesla Autopilot, meaning cars drive themselves on highways from Mountain View to San Francisco with no one touching the wheel.*

Figure: EON dash cam running chffrplus.*

But soon after launching the product, the National Highway Traffic Safety Administration (NHTSA) sent an inquiry and threatened penalties if Hotz did not submit to oversight considerations. In response, Hotz pulled the product from sale and pursued another path. He decided to market another product that was the hardware-only version of the product.

Then, in 2016, he open-sourced the software so that anyone could install it in the appropriate hardware. And with that, Comma.ai abstained from the responsibility of running its software in cars. But consumers still had access to the technology, allowing their cars to drive themselves. Comma.ai continues to develop its software, and drivers can buy the hardware and install the software in their cars. Some people estimate that around 1,000 of these modified cars run on the streets now.

new Recently, Comma.ai has announced that they have become profitable.*

The Brain of the Self-Driving Car

Figure: The parts of the autonomous car’s brain.

Three main parts form the brain of an autonomous car: localization, perception, and planning. But even before tackling these three items, the software must integrate the data from different sensors, such as cameras, radars, LIDAR, and GPS. Different techniques ensure that if data from a given sensor is noisy, meaning it contains unwanted or unclear data, then other sensors help out with their information. And, there are methods for merging data from these different sensors.

Once data has been acquired, the next step for the software is to know where it is. This process includes finding the physical location of the vehicle and which direction the car needs to head, for example, which exits it needs to take to deliver the passenger to their destination. One potential solution is to use LIDAR with background subtraction to match the sensor data to a high-definition map.

Figure: Long tail.

The next part of the software stack is harder. Perception basically involves answering the question of what is around the vehicle. A car needs to find traffic signs and determine which color they are. It needs to see where the lane markings are and where cars, trucks, and buses are. Perception includes lane detection, traffic light detection, object detection and tracking, and free space detection.

The hardest part of this problem is in the long tail, which describes the diverse scenarios that show up only occasionally. When driving, that means situations like traffic lights with different colors from the standard red, yellow, and green or roundabouts with multiple lanes. These scenarios happen infrequently, but because there are so many different possibilities, it is essential to have a dataset large enough to cover them all.

The last step, path planning, is by far the hardest. Given the car’s location, its surroundings, and its passengers’ destination, how does it get there? The software must calculate the next steps to getting to the desired place, including route planning, prediction, behavior planning, and trajectory planning. The solution ideally includes mimicking human behavior based on actual data from people driving.

These three steps combine to form the actions cars need to take based on the information given. The system decides whether the vehicle needs to turn left, brake, or accelerate. The instructions fed to a control system ensure the car does not do anything unacceptable. This system comes together to make cars drive themselves through the streets and forms the “magic” behind cars driven by Tesla, Waymo, Comma.ai, and many others.

Ethics and Self-Driving Cars

As stated earlier, traffic fatalities are inevitable, and, therefore, these companies must address the ethical concerns associated with the technology. The software algorithms determine what action autonomous vehicles perform. When a collision is unavoidable, in what order should the events occur?

This is a thought experiment described as the trolley problem. For example, it is a straightforward decision to have the car run into a fire hydrant instead of hitting a pedestrian. And while some may disagree, it is more humane to hit a dog in a crosswalk rather than a mother pushing a baby in a stroller. But that, I believe, is where the easy decisions end. What about hitting an older adult as opposed to two young adults? Or, in a most extreme case, is it better to choose to run the car off a cliff, killing the driver and all passengers, instead of plowing into a group of kindergarten students?*

Society sometimes focuses too much on technology instead of looking at the complete picture. In my opinion, we must encourage the ethical use of science, and, as such, we need to invest the proper resources into delving into this topic. It is by no means easy to solve, but allocating the appropriate means for discussing this topic only betters our society.

The Future of Self-Driving Cars

But the worries about operatorless elevators were quite similar to the concerns we hear today about driverless cars.Garry Kasparov*

There is a lot of talk about self-driving cars and how they will one day replace truck drivers, and some say that the transition will happen all of a sudden. In fact, the change will happen in steps, and it will start in a few locations and then expand rapidly. For example, Tesla is releasing software updates that make their car more and more autonomous. It first started releasing software that let its cars drive on highways, and with a later software update, its cars were able to merge into traffic and change lanes. Waymo is now testing its self-driving cars in downtown Phoenix. But it might not be surprising if Waymo starts rolling out their service in other areas.

The industry talks about five levels of autonomy to compare different cars’ systems and their capabilities. Level 0 is when the driver is completely in control, and Level 5 is when the car drives itself and does not need driver assistance. The other levels range between these two. I am not going to delve into the details of each level because the boundaries are blurry at best, and I prefer to use other ways to compare them, such as disengagements per mile. However they are measured, as the systems improve, autonomous cars can prevent humans from making mistakes and help avoid accidents caused by other drivers.

Self-driving cars will reduce and nearly eliminate the number of car accidents, which kill around 1 million people globally every year. Already, the number of annual deaths per billion miles has decreased due to safety features and improvements in the vehicle designs, like the introduction of seatbelts and airbags. Cars are now more likely to incur the damage and absorb the impact from an accident, reducing the injuries to passengers.

Figure: US vehicle miles traveled and proportionate mortality rate. Number of miles driven by cars versus the number of annual deaths per billion miles driven.

Autonomous driving will reduce the total number of accidents and deaths. In the United States alone, around 13 million collisions occur annually, of which 1.7 million cause injuries, and 35,000 people die. Driver error causes approximately 90% of the accidents, a third of which involve alcohol.* Autonomy can help prevent these disasters.

Deaths are not the only problem caused by accidents. They also have a huge economic effect. The US government estimates a cost of about $240B per year on the economy, including medical expenses, legal services, and property damage. In comparison, US car sales are around $600B per year. According to data from the US National Highway Traffic Safety Administration (NHTSA), the crash rate for Tesla cars was reduced by 40% after the introduction of the Autopilot Autosteer feature.* An insurer offered a 5% discount for Tesla drivers with the assist feature turned on.*

Autonomy will have an effect on traffic overall. Cars will not necessarily need to stop at traffic signs because they can coordinate among themselves to determine the best route or to safely drive at 80 miles per hour 2 feet away from each other. So, traffic flow will improve, allowing more cars on the streets. With fewer accidents, there might be less traffic congestion. Estimates say that as much as a third of car accidents happen because of congestion, and these create even more congestion. The impact of autonomy on congestion remains unclear since, to my knowledge, no studies exist yet. Self-driving cars will certainly increase capacity, but as the volume increases, so does demand. If it becomes cheaper or easier for people to use self-driving cars, then the number of people who use them will escalate.

Parking will also transform with autonomy because if the car does not have to wait for you within walking distance, then it can do something else when people do not need it.* The current parking model is a source of congestion, with some studies suggesting that a double-digit percentage of traffic in dense urban areas comes from people driving around looking for parking places. An autonomous car can wait somewhere else, and an on-demand car can simply drop you off and go pick up other passengers. But this new model might also create congestion because in both cases, the cars need to go pick up people rather than being parked and waiting for people to come to it. With enough density, the on-demand car might be the one that is already dropping off someone else close to you, similar to Uber’s model.

Parking is not only important for traffic but also for the use of land. Some parking is on the street, so removing it adds capacity for other cars driving or for people walking. For example, parking in incorporated Los Angeles County takes up approximately 14% of the land. Adding parking lots and garages is expensive, driving up construction prices and housing expenses.* A study in Oakland, California, showed that government-mandated parking requirements increased construction costs per apartment by 18%.

Removing the cost of drivers from on-demand services, like Uber and Lyft, reduces the expenditure by around 75%. Factor in the reduced price of insurance because of fewer car accidents, and the cost goes down even further. Transportation as a Service is the new business model.*

Transportation as a Service

Transportation as a service is a type of service that enables consumers to move without having to buy or own vehicles.

Transportation as a Service (TaaS), also referred to as Mobility as a Service (MaaS) or Mobility on Demand (MoD),* will disrupt not only the transportation industry but also the oil industry with the addition of electric vehicles (EV). TaaS goes hand in hand with EVs: electric cars are much less expensive to maintain because, for one, their induction motors have fewer moving parts than the internal combustion engines (ICE) of gas-powered cars.* For autonomous vehicles in the TaaS sector, low maintenance costs are essential, as car rental companies know pretty well.

The average American family spends $9K on road transportation every year. Estimates are that they will save more than $5.6K per year in transportation costs with TaaS, leaving them to use that money in other areas like entertainment. Truly cheap, on-demand services will have even more consequences. As TaaS with self-driving cars becomes cheaper, we must rethink public transportation. If everyone uses on-demand services, then no one will need public transportation.

Transitioning all the people who are currently traveling through the underground subway system or elevated trains to cars on surface streets can increase congestion on the roads. In high-density areas, like New York City, people live in stacked buildings on different floors. If everyone needs to move at the same time, such as during rush hour, and go through only one “floor,” meaning the aboveground road system, then congestion will invariably happen. Therefore, self-driving vehicles need to be able to move around in a manner not dependent on only the surface streets. Autonomous vehicles should travel through many levels.

Figure: Kitty Hawk’s first prototype of its self-driving flying car.

One possibility is self-driving drones, something like a Jetsonian future. Kitty Hawk Corporation, a startup developed by Sebastian Thrun, already has a few prototypes of these flying cars.*

controversySome argue that this solution might not work inside highly dense areas because these drones produce too much noise. And if they fail and crash, they can damage property or humans.

The most recent prototype, however, is not as noisy as some claim. From a distance of 50 feet, these vehicles sound like a lawn mower, and from 250 feet, like a loud conversation. And, their design is such that if the motor or one of the blades fail, they will not fall to the ground.

Another possibility for adding more levels for on-demand vehicles is to go under the ground, creating tunnels. But digging tunnels is a huge financial and construction investment. Elon Musk’s Boring Company focuses on reducing the cost of tunneling by a factor of ten by narrowing the tunnel diameter as well as increasing the speed of their Tunnel Boring Machine (TBM).* Their goal is to make them as fast as a snail. Musk thinks that going underground is safer than flying vehicles and provides more capacity by adding more tunnels on different levels. The Boring Company already has a loop at the Las Vegas Convention Center.*

TaaS will have a direct impact on the driving industry as well as employment. In the United States alone, self-driving cars will impact around 200,000 taxi and private drivers and 3.5 million truck drivers.* Displacing truck drivers, in particular, will significantly impact the economy since truck driving is one of the largest professions in the United States.

Given that during peak driving hours only 10% of cars are in motion, we can expect that TaaS will result in fewer cars and that could affect production numbers. Over 10 million new cars are sold in the US market every year. With fewer needed for the same capacity, the total number introduced to the market might go down. Also, the cost of transportation will decline by a large factor because you need fewer resources to make the cars. Using TaaS will be much cheaper than owning a car because of the reduced usage as well as the fuel and maintenance savings when using EVs for autonomous driving.

Once in place, switching to TaaS is easy for consumers and requires no investment or contract, so I believe that the adoption rate will be high.* And as consumers’ comfort levels rise due to increased safety and less hassle, usage will spread. First, the switch will occur in high-density areas with high real estate values, like San Francisco and New York, and then it will spread to rural, less dense areas.

Figure: Cost difference of autonomous EV cars versus ICE cars.

As this shift occurs, fewer people will buy new cars, resulting in a decline in car production.* We already see this trend with young adults who use car-sharing services in cities and do not buy vehicles. According to one study,* young people drove 23% less between 2001 and 2009.* The car types that people drive will change over time as well. If you move to the city, you might not need an F-150 pickup truck but rather a much smaller car. Or, if you commute from one highly dense area to another, it might make sense to have autonomous vehicles that transport more than ten passengers at a time.

Figure: Percentage of drivers for different age groups.

The availability of on-demand, door-to-door transport via TaaS vehicles will improve the mobility of those unable to drive or who cannot afford to own a car. Because the cost of transportation will go down, more people will travel by car. Experiments with TaaS already exist in different areas of the US. For example, Voyage, a Silicon Valley startup acquired by Cruise in 2021, deployed cars with “remote” drivers that run its software in The Villages in Florida, a massive retirement community with 125,000 residents.* Voyage is already experimenting with what will become mainstream in a few years. Residents of the retirement community summon a car with a phone app, and the driverless car picks them up and drops them anywhere inside this community. The vehicles are monitored by workers who check for any problems from a control center. Transportation will completely change in the next decade and so will cities. Hopefully, governments will ease the transition.

Industry Applicationsan hour, 40 links

Voice Assistants (Siri)

Samantha: You know what’s interesting? I used to be so worried about not having a body, but now I truly love it. I’m growing in a way I couldn’t if I had a physical form. I mean, I’m not limited—I can be anywhere and everywhere simultaneously. I’m not tethered to time and space in a way that I would be if I was stuck in a body that’s inevitably going to die.Her (2013)

Voice assistants are becoming more and more ubiquitous. Smart speakers became popular after Amazon introduced Echo, a speaker with Alexa as the voice assistant, in November 2014. By 2017, tens of millions of smart speakers were in people’s homes, and every single one of them had voice as their main interface. Voice assistants are not only present in smart speakers but also in every smartphone. The most well-known one, Siri, powers the iPhone.

You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!