I learned very early the difference between knowing the name of something and knowing something.Richard Feynman*
Machine learning algorithms usually learn by analyzing data and inferring what kind of model or parameters a model should have or by interacting with the environment and getting feedback from it. Humans can annotate this data or not, and the environment can be simulated or the real world.
The three main categories that machine learning algorithms can use to learn are supervised learning, unsupervised learning, and reinforcement learning. Other techniques can be used, such as evolution strategies or semi-supervised learning, but they are not as widely used or as successful as the three above techniques.
Supervised learning has been widely used in training computers to tag objects in images and to translate speech to text. Let’s say you own a real estate business and one of the most important aspects of being successful is to figure out the price for a house when it enters the market. Determining that price is extremely important for completing a sale, making both the buyer and seller happy. You, as an experienced realtor, can figure out the pricing for a house based on your previous knowledge.
But as your business grows, you need help, so you hire new realtors. To be successful, they also need to determine the price of a house in the market. In the interest of helping these inexperienced people, you write down the value of the houses that the company already bought and sold, based on size, neighborhood, and various details, including the number of bathrooms and bedrooms and the final sale price.
|Bedrooms||Sq. Feet||Neighborhood||Sale Price|
Table: Sample data for a supervised learning algorithm.
This information is called the training data; that is, it is example data that contains the factors or features that may influence the price of a house in addition to the final sale price. New hires look at all this data to start learning which factors influence the final price of a house. For example, the number of bedrooms might be a great indicator of price, but the size of the house may not necessarily be as important. If inexperienced realtors have to determine the price of a new house that enters the market, they simply check to find a house that is most similar and use that information to determine the price.
|Bedrooms||Sq. Feet||Neighborhood||Sale Price|
Table: Missing information that the algorithm will determine.
That is precisely how algorithms learn from training data with a method called supervised learning. The algorithm knows the price of some of the houses in the market, and it needs to figure out how to predict the new price of a house that is entering the market. In supervised learning, the computer, instead of the realtors, figures out the relationship between the data points. The value the computer needs to predict is called the label. In the training data, the labels are provided. When there is a new data point whose value, the label, is not defined, the computer estimates the missing value by comparing it to the ones it has already seen.
Unsupervised learning is a machine learning technique that learns patterns with unlabeled data.
In our example, unsupervised learning is similar to supervised learning, but the price of each house is not part of the information included in the training data. The data is unlabeled.
Table: Sample training data for an unsupervised learning algorithm.
Even without the price of the houses, you can discover patterns from the data. For example, the data can tell that there is an abundance of houses with two bedrooms and that the average size of a house in the market is around 1,200 square feet. Other information that might be extracted is that very few houses in the market in a certain neighborhood have four bedrooms, or that five major styles of houses exist. And with that information, if a new house enters the market, you can figure out the most similar houses by looking at the features or identifying that the house is an outlier. This is what unsupervised learning algorithms do.
The previous two ways of learning are based solely on data given to the algorithm. The process of reinforcement learning is different: the algorithm learns by interacting with the environment. It receives feedback from the environment either by rewarding good behavior or punishing bad. Let’s look at an example of reinforcement learning.
Say you have a dog, Spot, who you want to train to sit on command. Where do you start? One way is to show Spot what “sit” means by putting her bottom on the floor. The other way is to reward Spot with a treat whenever she puts her bottom on the floor. Over time, Spot learns that whenever she sits on command she receives a treat and that this is a rewarded behavior.
Reinforcement learning works in the same way. It is a framework built on top of this insight that you can teach intelligent agents, such as a dog or a deep neural network, to achieve a certain task by rewarding them when they correctly perform the task. And whenever the agent achieves the desired outcome, its chance of repeating such an action increases due to the reward. Agents are algorithms which process input and act as a voice for the output. Spot is the agent in the example.
Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones.
Reinforcement learning as a learning framework is interesting, but the associated algorithms are the most important aspect. The way they work is by defining the reward the agent receives once it achieves a state, like sitting. The formulation of reinforcement algorithms is to find a policy, such as a specific mapping of the states to the actions to be taken, which maximizes the expected reward so that the agent learns the behavior that maximizes the reward (the treat).
In the reinforcement learning formulation, the environment gives the reward: the agent does not figure out the reward itself but only receives it by interacting with the environment and hitting on the expected behavior. One problem with this is that the agent sometimes takes a long time to receive a reward. For example, if Spot never sits, then she never receives a treat and does not learn to sit. Or, let’s say you want an agent to learn how to navigate a maze and the reward is only given when the agent exits the maze. If the agent takes too long before leaving, then it is hard to say which actions the agent took that helped it get out of the maze. Another problem is that the agent only learns from its own successes and failures. That is not necessarily the case with humans in the real world. No one needs to drive off a cliff thousands of times to learn how to drive. People can figure out rewards from observation.
The following two steps define reinforcement algorithms:
Add randomness to the agent’s actions so it tries something different, and
If the result was better than expected, do more of the same in the future.
Adding randomness to the actions ensures that the agent searches for the correct actions to take. And if the result is the one expected, then the agent tries to do more of the same in the future. The agent does not necessarily repeat the exact same actions, however, because it still tries to improve by exploring potentially better actions. Even though reinforcement algorithms can be explained easily, they do not necessarily work for all problems. For reinforcement learning to work, the situation must have a reward, and it is not always easy to define what should or should not be rewarded.
Reinforcement algorithms can also backfire. Let’s say that an agent is rewarded by the number of paper clips it makes. If the agent learns to transform anything into paper clips, it could be that it makes everything into paper clips.* If the reward does not punish the agent when it creates too many paper clips, the agent can misbehave. Reinforcement learning algorithms are also mostly inefficient because they spend a lot of time searching for the correct solution and adding randomized actions to find the right behavior. Even with these limitations, they can accomplish an overwhelming variety of tasks, such as playing Go games at a superhuman level and making robotic arms grasp objects.
Another way of learning that is particularly useful with games is having multiple agents play against each other. Two classic examples are chess or Go, where two agents compete with each other. Agents learn what actions to take by being rewarded when they win the game. This technique is called self-play, and it can be used not only with a reinforcement learning algorithm, but also to generate data. In Go, for example, it can be used to figure out which plays are more likely to make a player win. Self-play generates data from computing power, that is, from the computer playing itself.
The three learning categories are each useful in different situations. Use supervised learning when there is a lot of available data that is labeled by people, such as when others tag people in Facebook. Unsupervised learning is used primarily when there is not much information about the data points that the system needs to figure out, such as in cyber attacks. One can infer that they are being attacked by looking at the data and seeing odd behaviors that were not there before the attack. The last, reinforcement learning, is mainly used when there is not much data about the task that the agent needs to achieve, but there are clear goals, like winning a chess game. Machine learning algorithms, more specifically deep learning algorithms, are trained with these three modes of learning.