Computer languages of the future will be more concerned with goals and less with procedures specified by the programmer.Marvin Minsky*
The Software 2.0 paradigm started with the development of the first deep learning language, TensorFlow.
Based on the goal, the programmer writes the skeleton of the program by defining the neural network architecture(s). Then, the programmer uses the computer hardware to find the exact neural network that best performs the specified goal and feeds it data to train the neural network. With traditional software, Software 1.0, most programs are stored as programmer-written code that can span thousands to a billion lines of code. For example, Google’s entire codebase has around two billion lines of code.* But in the new paradigm, the program is stored in memory as the weights of the neural architecture with few lines of code written by programmers. There are disadvantages to this new approach: software developers sometimes have to choose between using software that they understand but only works 90% of the time or a program that performs well in 99% of the cases but it is not as well understood.
Some languages were created only for writing Software 2.0, that is, programming languages to help build, train, and run these neural networks. The most well-known and widely used one is TensorFlow. Developed by Google and released internally in 2015, it now powers Google products like Smart Reply and Google Photos, but it was also made available for external developers to use. It is now more popular than the Linux operating system by some metrics. It became widely used by developers, startups, and other big companies for all types of machine learning tasks, including translation from English into Chinese and reading handwritten text. TensorFlow is used to create, train, and deploy a neural network to perform different tasks. But to train the resulting network, the developer must feed it data and define the goal the neural network optimizes for. This is as important as defining the neural network.
The Importance of Good Training Data
Because a large part of the program is the data fed to it, there is growing concern that datasets used for these networks represent all possible scenarios that the program may run into. The data has become essential for the software to work as expected. One of the problems is that sometimes the data might not represent all use cases that a programmer wants to cover when developing the neural network. And, the data might not represent the most important scenarios. So, the size and variety of the dataset have become more and more important in order to have neural networks that perform as expected.
For example, let’s say that you want a neural network that creates a bounding box around cars on the road. The data needs to cover all cases. If there is a reflection of a car on a bus, then the data should not have it labeled as a car on the road. For the neural network to learn that, the programmer needs to have enough data representing this use case. Or, let’s say that five cars are in a car carrier. Should the software create a bounding box for each of the automobiles or just for the car carrier? Either way, the programmer needs enough examples of these cases in the dataset.
Another example is if the car’s training data comes with a lot of data gathered in certain lighting conditions or with a specific vehicle. Then, if those same algorithms encounter a vehicle with a different shape or in different lighting, the algorithm may behave differently. One example that happened to Tesla was when the self-driving software was engaged and the software didn’t notice the trailer in front of the car. The white side of the tractor trailer against a brightly lit sky was hard to detect. The crash resulted in the death of the driver.*
Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Labeling, that is, creating the dataset and annotating it with the correct information, is an important iterative process that takes time and experience to make work correctly. The data needs to be captured and cleaned. It is not something done once and then it is complete. Rather, it is something that evolves.
The Japanese Cucumber Farmer
TensorFlow has not only been used by developers, startups, and large corporations, but also by individuals. One surprising story is of a Japanese cucumber farmer. An automotive engineer, Makoto Koike, helped his parents sort cucumbers by size, shape, color, and other attributes on their small family farm in the small city of Kosai. For years, they sorted their pickles manually. It happens that cucumbers in Japan have different prices depending on their characteristics. For example, more colorful cucumbers and ones with many prickles are more expensive than others. Farmers pull aside the cucumbers that are more expensive so that they are paid fairly for their crop.
The problem is that it is hard to find workers to sort them during harvest season, and there are no machines sold to small farmers to help with the cucumber sorting. They are either too expensive or do not provide the capabilities small farms need. Makoto’s parents separated the cucumbers by hand, which is as hard as growing them and takes months. Makoto’s mother used to spend eight hours per day sorting them. So, in 2015, after seeing how AlphaGo defeated the best Go players, Makoto had an idea. He decided to use the same programming language, TensorFlow, to develop a cucumber-sorting machine.
To do that, he snapped 7,000 pictures of cucumbers harvested on his family’s farm. Then, he tagged the pictures with the properties that each cucumber had, adding information regarding their color, shape, size, and whether they contained prickles. He used a popular neural network architecture and trained it with the pictures that he took. At the time, Makoto did not train the neural network with the computer servers that Google offered because they charged by time used. Instead, he trained the network using his low-power desktop computer. Therefore, to train his tool in a timely manner, he converted the pictures to a smaller size of 80x80 pixels. The smaller the size of the images, the faster it is to train the neural network because the neural network is smaller as well. But even with the low resolution, it took him three days to train his neural network.
After all the work, “when I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real-use cases, the accuracy drops to about 70%. I suspect the neural network model has the issue of ‘overfitting,’” Makoto stated.*
Overfitting, also called overtraining, is the phenomenon when a machine learning model is created and only works for the training data.
Makoto created a machine that was able to help his parents sort the cucumbers into different shapes, color, length, and level of distortion. It was not able to figure out if the cucumbers had many prickles or not because of the low-resolution images used for training. But the resulting machine turned out to be very helpful for his family and cut out the time that they spent manually sorting their produce.
The same technology built by one of the largest companies in the world and used to power its many products was also used by a small farmer on the other side of the globe. TensorFlow democratized access so many people could develop their own deep learning models. It will not be surprising to find many more “Makotos” out there.
In Software 1.0, problems—called bugs—happened mostly because a person wrote logic that did not account for edge cases or handle all the possible scenarios. But in the Software 2.0 stack, bugs are much different because the data may confuse the neural network.
One example of such a bug was when the autocorrect for iOS started using a weird character “# ?” to replace the word “I” when sending a message. The operating system mistakenly autocorrected the spelling of I because, at some point, the data it received taught it to do so. The model learned that “# ?” was the correct spelling according to the data. As soon as someone sent “I,” the model thought it was important to fix and replaced it everywhere it could. The bug spread like a virus, reaching millions of iPhones. Given how fast and important these bugs can be, it is extremely important that the data as well as the programs are well tested, making sure that these edge cases do not make programs fail.