13 minutes, 7 links

From

editione1.0.2

Updated November 2, 2022 Youβre reading an excerpt of *Making Things Think: How AI and Deep Learning Power the Products We Use*, by Giuliano Giacaglia. **Purchase the book** to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.

I have always been convinced that the only way to get artificial intelligence to work is to do the computation in a way similar to the human brain. That is the goal I have been pursuing. We are making progress, though we still have lots to learn about how the brain actually works.Geoffrey Hinton*

Deep learning is a type of machine learning algorithm that uses multilayer neural networks and backpropagation as a technique to train the neural networks. The field was created by Geoffrey Hinton, the great-great-grandson of George Boole, whose Boolean algebra is a keystone of digital computing.*

The evolution of deep learning was a long process, so we must go back in time to understand it. The technique first arose in the field of control theory in the 1950s. One of the first applications involved optimizing the thrusts of the Apollo spaceships as they headed to the moon.

The earliest neural networks, called perceptrons, were the first step toward human-like intelligence. However, a 1969 book, *Perceptrons: An Introduction to Computational Geometry* by Marvin Minsky and Seymour Papert, demonstrated the extreme limitations of the technology by showing that a shallow network, with only a few layers, could perform only the most basic computational functions. At the time, their book was a huge setback to the field of neural networks and AI.

Getting past the limitations pointed out in Minsky and Papertβs book requires multilayer neural networks. To create a multilayer neural network to perform a certain task, researchers first determine how the neural network will look by determining which neurons connect to which others. But to finish creating such a neural network, the researchers need to find the weights between each of the neuronsβhow much one neuronβs output affects the next neuron. The training step in deep learning usually does that. In that step, the neural network is presented with examples of data and the training software figures out the correct weights for each connection in the neural network so that it produces the intended results; for example, if the neural network is trained to classify images, then when presented with images that contain cats, it says there is a cat there.

**Backpropagation** is an algorithm that adjusts the weights in such a way that whenever you change them, the neural network gets closer to the right output faster than was previously possible. The way this works is that the neurons that are closest to the output are the ones adjusted first. Then, after all the classification of images cannot be made better by adjusting those, the prior layer is updated to improve the classification. This process continues until the first layer of neurons is the one adjusted.

In 1986, Hinton published the seminal paper on deep neural networks (DNNs), βLearning representations by back-propagating errors,β with his colleagues David Rumelhart and Ronald Williams.* The article introduced the idea of backpropagation, a simple mathematical technique that led to huge advances in deep learning.

The backpropagation technique developed by Hinton finds the weights for each neuron in a multilayer neural network more efficiently. Before this technique, it took an exponential amount of time to find the weightsβalso known as coefficientsβfor a multilayer neural network, which made it extremely hard to find the correct coefficients for each neuron. Before, it took months or years to train a neural network to be the correct one for the inputs, but this new technique took significantly less time.

Hiltonβs breakthrough also showed that backpropagation enabled easily training a neural network that had more than two or three layers, breaking through the limitation imposed by shallow neural networks. Backpropagation allowed the innovation of finding the exact weights for a multilayer neural network to create the desired output or outcome. This development allowed scientists to train more powerful neural networks, making them much more relevant. For comparison, one of the most performant neural networks in vision, called Inception, has approximately 22 layers of neurons.

The figure below shows an example of both a simple neural network (SNN) and a deep learning neural network (DLNN). On the left of each network is the input layer, represented by the red dots. These receive the input data. In the SNN, the hidden layer neurons are then used to make the adjustments needed to reach the output (blue dots) on the right side. In contrast, the use of more than one layer characterizes the DLNN, allowing for far more complex behavior that can handle more involved input.

*Figure: A simple neural network and a multilayer neural network.*

The way researchers usually develop a neural network is first by defining its architecture: the number of neurons and how they are arranged. But the parameters of the neurons inside the neural network need determining. To do that, researchers initialize the neural network weights with random numbers. After that, they feed it the input data and determine if the output is similar to the one they want. If it is not, then they update the weights of the neurons until the output is the closest to what the training data shows.

For example, letβs say you want to classify some images as containing a hot dog and others as not containing a hot dog. To do that, you feed the neural network images containing hot dogs and others that do not, which is the training data. Following the initial training, the neural network is then fed new images and needs to determine if they contain a hot dog or not.

These input images are composed of a matrix of numbers, representing each pixel. The neural network goes through the image, and each neuron applies matrix multiplication, using the internal weights, to the numbers in the image, generating a new image. The outputs of the neurons are a stack of lower resolution images, which are then multiplied by the neurons on the next layer. On the final layer, a number comes out representing the solution. In this case, if it is positive, it means that the image contains a hot dog, and if it is negative, it means that it does not contain a hot dog.

The problem is that the weights are not defined in the beginning. The process of finding the weights, known as training the network, that produce a positive number for images that contain a hot dog and a negative number for those that do not is non-trivial. Because there are many weights in a neural network, it takes a long time to find the correct ones for all the neurons in a way that all the images are classified correctly. Simply too many possibilities exist. Additionally, depending on the input set, the network can become overtrained to the specific dataset, meaning it focuses too narrowly on the dataset and cannot generalize to recognize images outside of it.

The complete process of training the network relies on passing the input data through the network multiple times. Each pass takes the output from the previous one to make adjustments in future passes. Each passesβ output is used to provide feedback to improve the algorithm through backpropagation.

One of the reasons why backpropagation took so long to be developed was that the function required computers to perform multiplication, which they were pretty bad at in the 1960s and 1970s. At the end of the 1970s, one of the most powerful processors, the Intel 8086, could compute less than one million instructions per second.* For comparison,* the processor running on the iPhone 12 is more than one million times more powerful than that.*

*Figure: Geoffrey Hinton, who founded the field of deep learning.*

Deep learning only really took off in 2012, when Hinton and two of his Toronto students showed that deep neural networks, trained using backpropagation, beat state-of-the-art systems in image recognition by almost halving the previous error rate. Because of his work and dedication to the field, Hintonβs name became almost synonymous with the field of deep learning. He now has more citations than the next top three deep learning researchers combined.

After this breakthrough, deep learning started being applied everywhere, with applications including image classification, language translation, and text-to-speech comprehension as is used by Siri, for example. Deep learning models can improve any task that can be addressed by heuristics, those techniques that are applied to solve some tasks that were previously defined by human experience or thought, including games like Go, chess, and poker as well as activities like driving cars. Deep learning will be used more and more to improve the performance of computer systems with tasks like by figuring out the order that processes should run in or what data should remain in a cache. These tasks can all be much more efficient with deep learning models. Storage will be a big application of it, and in my opinion, the use of deep learning will continue to grow.

It is not a coincidence that deep learning took off and performed better than most of the state-of-the-art algorithms: multilayer neural networks have two very important qualities.*

First, they express the kind of very complicated functions needed to solve problems that we need to address. For example, if you want to understand what is going on with images, you need a function that retrieves the pixels and applies a complicated function that translates them into text or its representation to human language. Second, deep learning can learn from just processing data, rather than needing a feedback response. These two qualities make it extremely powerful since many problems, like image classification, require a lot of data.

The reason why deep neural networks are as good as they are is that they are equivalent to circuits, and a neuron can easily implement a Boolean function. For that reason, a deep enough network can simulate a computer given a sufficient number of steps. Each part of a neural network simulates the simplest part of a processor. That means that deep neural networks are as powerful as computers and, when trained correctly, can simulate any computer program.

Currently, deep learning is a battleground between Google, Apple, Facebook, and other technology companies that aim to serve individualsβ needs in the consumer market. For example, Apple uses deep learning to improve its models for Siri, and Google for its recommendation engine on YouTube. Since 2013, Hinton has worked part-time at Google in Mountain View, California, and Toronto, Canada. And, as of this writing in 2021, he is the lead scientist on the Google Brain team, arguably one of the most important AI research organizations in the world.

There are many types of deep neural networks, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory networks (LSTMs), and each has different properties. For example, recurrent neural networks are deep neural networks in which neurons in higher layers connect back to the neurons in lower layers. Here, weβll focus on convolutional neural networks, which are computationally more efficient and faster than most other architectures.* They are extremely relevant as they are used for state-of-the-art text translation, image recognition, and many other tasks.

*Figure: A recurrent neural network, where one of the neurons feeds back into a previous layer.*