If after I die, people want to write my biography, there is nothing simpler. They only need two dates: the date of my birth and the date of my death. Between one and another, every day is mine.Fernando Pessoa*
The birth of artificial intelligence was seen with the initial development of neural networks including Frank Rosenblatt’s creation of the perceptron model and the first demonstration of supervised learning. That led to the Georgetown-IBM experiment, an early language translation system. Finally, the end of the beginning was marked by the Dartmouth Conference, at which artificial intelligence was officially launched as a field in computer science, leading to the first government funding of AI.
In 1943, Warren S. McCulloch, a neurophysiologist, and Walter Pitts, a mathematical prodigy, created the concept of artificial neural networks. They designed their system based on how our brains work and patterned it after the biological model of how neurons—brain cells—work with each other. Neurons interact with their extremities, firing signals via their axon across a synapse to neighboring neurons’ dendrites. Depending on the voltage of this electrical charge, the receiving neuron proceeds to either fire a new charge of electrical pulse to the next set of neurons, or not.
Figure: Artificial neural networks are based on the simple principle of electrical charges and how they are passed in the brain.
The hard part of modeling the correct artificial neural network, that is, one that achieves the task that you are trying to solve, is that you need to figure out what voltage one neuron should pass to another as well as what it takes for a neuron to fire.
Both the voltages and the firing criteria become variables that need to be determined for the model. In an artificial neural network, the voltage that is passed from neuron to neuron is called a weight. These weights need to be trained so that the artificial neural network performs the task at hand. One of the earliest ways to do this is called Hebbian learning, which we’ll talk about next.
In 1947, around the same time that Arthur Samuel was working on the first computer that would beat a state checker champion, Donald Hebb, a Canadian psychologist with a PhD from Harvard University, became a Professor of Psychology at McGill University. Hebb would later be the first to develop the idea of neural networks.
In 1949, Hebb developed a theory known as Hebbian learning, which proposes an explanation for how our neurons fire and change when we learn something new. It states that when one neuron fires to another, the connection between them develops or enlarges. That means that whenever two neurons are active together, because of some sensory input or other reason, these neurons tend to become associated.
Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Therefore, the connections among neurons become stronger or grow when the neurons fire together, making the link between the two neurons harder to break. Hebb explained how that is the way humans learn. Hebbian learning, the process of making connections stronger between neurons that fire together, was the way to create artificial neural networks early on, but later, other techniques became more predominant.
The way this network of neurons become associated with a memory or some pattern that causes all these neurons to fire together became known as an engram. Gordon Allport defines engrams as, “If the inputs to a system cause the same pattern of activity to occur repeatedly, the set of active elements constituting that pattern will become increasingly strongly inter-associated. That is, each element will tend to turn on every other element and (with negative weights) to turn off the elements that do not form part of the pattern. To put it another way, the pattern as a whole will become ‘auto-associated.’ We may call a learned (auto-associated) pattern an engram.”*
With these models in mind, in the summer of 1951, Marvin Minsky, together with two other scientists, developed the Stochastic Neural Analog Reinforcement Calculator (SNARC)—a machine with a randomly connected neural network of approximately 40 artificial neurons.* The SNARC was built to try and find the exit from a maze in which the machine played the part of the rat.
Minsky, with the help of an American psychologist from Harvard, George Miller, developed the neural network out of vacuum tubes and motors. The machine first proceeded randomly, then the correct choices were reinforced by making it easier for the machine to make those choices again, thus increasing their probability compared to other paths. The device worked and made the imaginary rat find a path to the exit. It turned out that, by an electronic accident, they could simulate two or three rats in the maze at the same time. And, they all found the exit.
Minsky thought that if he “could build a big enough network, with enough memory loops, it might get lucky and acquire the ability to envision things in its head.”* In 1954, Minsky published his PhD thesis, presenting a mathematical model of neural networks and its application to the brain-model problem.*
This work inspired young students to pursue a similar idea. They sent him letters asking why he did not build a nervous system based on neurons to simulate human intelligence. Minsky figured that this was either a bad idea or would take thousands or millions of neurons to make work.* And at the time, he could not afford to attempt building a machine like that.
In 1956, Frank Rosenblatt implemented an early demonstration of a neural network that could learn how to sort simple images into categories, like triangles and squares.*
Figure: Frank Rosenblatt* and an image with 20x20 pixels.
He built a computer with eight simulated neurons, made from motors and dials, connected to 400 light detectors. Each of the neurons received a set of signals from the light detectors and spat out either a 0 or 1 depending on what those signals added up to.
Rosenblatt used a method called supervised learning, which is a way of saying that the data that the software looks at also has information identifying what type of data it is. For example, if you want to classify images of apples, the software would be shown photos of apples together with the tag “apple.” This approach is much like how toddlers learn basic images.
Figure: The Mark I Perceptron.
Perceptron is a supervised learning algorithm for binary classifiers. Binary classifiers are functions that determine if an input, which can be a vector of numbers, is part of a class.
The perceptron algorithm was first implemented on the Mark I Perceptron. It was connected to a camera that used a 20x20 grid of cadmium sulfide* photocells* producing a 400-pixel image. Different combinations of input features could be experimented with using a patchboard. The array of potentiometers on the right* implemented the adaptive weights.*
Rosenblatt’s perceptrons classified images into different categories: triangles, squares, or circles. The New York Times featured his work with the headline “Electronic ‘Brain’ Teaches Itself.”* His work established the principles of neural networks. Rosenblatt predicted that perceptrons would soon be capable of feats like greeting people by name. The problem is, however, that his algorithm did not work with multiple layers of neurons due to the exponential nature of the learning algorithm: it required too much time for perceptrons to converge to what engineers wanted them to learn. This was eventually solved, years later, by a new algorithm called backpropagation, which we’ll cover in the section on deep learning.
A multilayer neural network consists of three or more layers of artificial neurons—an input layer, an output layer, and at least one hidden layer—arranged so that the output of one layer becomes the input of the next layer.
Figure: A multilayer neural network.
The Georgetown-IBM Experiment
The Georgetown-IBM experiment translated English sentences into Russian and back into English. This demonstration of machine translation happened in 1954 to attract not only public interest but also funding.* This system specialized in organic chemistry and was quite limited, with only six grammar rules. An IBM 701 mainframe computer, designed by Nathaniel Rochester and launched in April 1953, ran the experiment.*
A feature article in the New York Times read, “A public demonstration of what is believed to be the first successful use of a machine to translate meaningful texts from one language to another took place here yesterday afternoon. This may be the cumulation of centuries of search by scholars for a mechanical translator.”
Figure: The Georgetown-IBM experiment translated 250 sentences from English to Russian.
The demo worked in some cases, but it failed for most of the sentences. A way of verifying if the machine translated a phrase correctly was to translate it from English to Russian and then back into English. If the sentence had the same meaning or was similar to the original, then the translation worked. But in the experiment, many sentences ended up different from the original and with an entirely new meaning. For example, given the original sentence “The spirit is willing, but the flesh is weak,” the result was “The whiskey is strong, but the meat is rotten.”
The system simply could not understand the meaning, or semantics, of the sentence, making mistakes in translation as a result. The errors mounted, completely losing the original message.
The Dartmouth Conference
AI was defined as a field of research in computer science in a conference at Dartmouth College in the summer of 1956. Marvin Minsky, John McCarthy, Claude Shannon, and Nathaniel Rochester organized the conference. They would become known as the “founding fathers” of artificial intelligence.
At the conference, these researchers wrote a proposal to the US government for funding. They divided the field into six subfields of interest: computers, natural language processing, neural networks, theory of computation, abstraction, and creativity.
From left to right: Trenchard More, John McCarthy, Marvin Minsky, Oliver Selfridge, and Ray Solomonoff.
At the conference, many predicted that a machine as intelligent as a human being would exist in no more than a generation, about 25 years. As you know, that was an overestimation of how quickly development of artificial intelligence would proceed. The workshop lasted six weeks and started the funding boom into AI, which continued for 16 years until what would be called the First AI Winter.
The Defense Advanced Research Projects Agency (DARPA) poured most of the money that went into the field during the period known as the Golden Years in artificial intelligence.
During this “golden” period, the early AI pioneers set out to teach computers to do the same complicated mental tasks that humans do, breaking them into five subfields: reasoning, knowledge representation, planning, natural language processing (NLP), and perception.
These general-sounding terms do have specific technical meanings, still in use today:
Reasoning. When humans are presented with a problem, we can work through a solution using reasoning. This area involved all the tasks involved in that process. Examples include playing chess, solving algebra problems, proving geometry theorems, and diagnosing diseases.
Knowledge representation. In order to solve problems, hold conversations, and understand people, computers must have knowledge about the real world, and that knowledge must be represented in the computer somehow. What are objects, what are people? What is speech? Specific computer languages were invented for the purpose of programming these things into the computer, with Lisp being the most famous. The engineers building Siri had to solve this problem for it to respond to requests.
Planning. Robots must be able to navigate in the world we live in, and that takes planning. Computers must figure out, for example, how to move from point A to point B, how to understand what a door is, and where it is safe to go. This problem is critical for self-driving cars so they can drive around roads.
Natural language processing. Speaking and understanding a language, and forming and understanding sentences are skills needed for machines to communicate with humans. The Georgetown-IBM experiment was an early demonstration of work in this area.
Perception. To interact with the world, computers must be able to perceive it, that is, they need to be able to see, hear, and feel things. Sight was one of the first tasks that computer scientists tackled. The Rosenblatt perceptron was the first system to address such a problem.