Interpretability of Neural Networks

8 minutes, 6 links


Updated November 2, 2022

You’re reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.

”By the help of microscopes, there is nothing so small, as to escape our inquiry; hence there is a new visible world discovered to the understanding.”Robert Hooke*

Mary spent the whole morning on her TikTok getting videos about how lamps work. Her TikTok feed is mostly that and cute videos of dogs. As with many who have interacted with TikTok or other social media apps, she never noticed that most of her social media feed is determined mostly by algorithms that tell her what to watch next.

This isn’t a problem when she is watching videos of dogs, but one day she was browsing around and started watching depressing videos, and the algorithm just reinforced that.

A neural network is behind the videos that she watches, recommending 70% of them.* And the algorithm is mostly a black box. That is, the humans that wrote the neural network don’t know its exact inner workings. Most of what they know is that using these algorithms increases engagement. But is that enough?

If a lot of our lives is determined by what neural networks decide, from housing prices to driving our cars, it might be worth understanding how and why these neural networks are making their decisions.

That’s where interpretability of neural networks comes in. Understanding how these β€œblack boxes” work might be important for understanding why different decisions are made and whether they are correct.

Neural Network Microscope

Many scientific discoveries have been made when scientists were able to β€œzoom in.” For example, microscopes let scientists see cells, and X-ray crystallography lets them see DNA. In the same way, AI scientists led by a young researcher, Chris Olah, have been studying and β€œzooming in” on neural networks that are used for image classification.*

In order to study those neural networks, the team at OpenAI analyzed each neuron on different neural networks and their features, as well as the connections between different neurons. To observe what different neurons represent in each neural network, the team analyzed how the neurons fire and activate when different images are run through the neural network. What they found was really interesting.*

Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Now Available

The team created the equivalent of a microscope but for β€œvisual” neural networksβ€”neural networks that are used to detect objects in images. With Microscope, researchers can systematically visualize every neuron in common neural networks including InceptionV1. In contrast to the typical picture of neural networks as a black box, the researchers were surprised by how approachable the network is on this scale.

The neurons became understandable. Some represent abstract concepts like edges or curves, and others, features like dog eyes or snouts. The team also was able to explain the connections between each neuron. The connections represent meaningful algorithms. For example, a connection may correspond to joining two different layers together, one representing dogs in one orientation and the other representing dogs in another orientation. These connections, or β€œcircuits,” can even represent simple logic, such as AND, OR, or XOR, over high-level visual features.

The researchers at OpenAI laid out a foundation to show that these neurons are probably mapping to these features. They didn’t prove that it was the case, but by testing the activation of such neurons with many different examples, they showed a causal link between the firing of these neurons and the images that they are purportedly representing. They’ve also shown that the neurons do not fire with images that are close to but not the same as those that these neurons are identifying.

Figure: InceptionV1 neural network representations and the union of the bottom two neural networks.

The OpenAI team showed that neurons can be understood and are representing real features.

That was not the only surprise found by these researchers. They also have found that the same features were detected across different neural networks. For example, curve detectors were found in the following neural networks: AlexNet, InceptionV1, VGG19, and ResnetV2-50.

The scientists detected that when training the same dataset with different neural networks, the same neurons were present in those networks. With that, they came up with a hypothesis that there is a universality of features in different networks. That is, if there are different architectures of neural networks trained in the same dataset, there are neurons that are likely to be present in all the different architectures.

Not only that, but they found complex Gabor detectors, which are usually found in biological neurons. They are similar to some classic β€œcomplex cells” of neuroscience. Could it be that our brain also has the same neurons present in artificial neural networks?

Language Interpretability Tool

For now the Microscope has only been used to analyze neural networks that classify images, but it can be imagined that the same technique could be applied to other areas, including natural language processing.

Other tools have been developed for neural networks used in natural language processing. One recently developed by a group at Google is called the Language Interpretability Tool* and is used to understand NLP tasks. The open-source tool allows for rich visualizations of model predictions and includes aggregate analysis of metrics and slicing of the dataset.*

The tool uses a technique called UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction). With UMAP, you can visualize the classification of certain datasets on a projection of the dataset into a smaller plane. In that way, you can identify unexpected results from data. That means that if a dataset contains many features or can be represented in a multi-dimensional space, UMAP will transform the data points and the dataset into a representation in a lower dimension. For example, you can reduce the dimension of the data points so that you can see the points in a 3D graph. It includes several other capabilities, but is not as developed as OpenAI Microscope.*

All these tools to understand and interpret neural networks are in their infancy. Microscope and the Language Interpretability Tool are just two examples of tools that are starting to be developed to understand the internals of neural networks.


It is clear that we are still in the early days of creating tools for interpreting and understanding neural networks in different applications. Neural networks might still be complex to understand, but there are ways to investigate what each neuron in a network might be doing independently.

As we take for granted the microscope as an important scientific instrument, the creation of a neural network microscope might be an important step to understand them and may even help fix possible bugs that neural networks create.

Economic Impact of AI

We wanted flying cars, instead we got 140 characters.Peter Thiel*

Jennifer woke up early on Monday morning. Before going to work, she received a personalized message distilling all information that she needed to know for the day. She walked out of her house and hailed an autonomous car that was waiting for her. As her car rode from her home to her office, Jennifer’s AI assistant briefed her about her day and helped her make some decisions. She arrived at her office in just under ten minutes, going through an underground tunnel.

That’s a future that seems far off, but it might be closer than we think. Deep learning might make most of these predictions reality. It is starting to change the economy and might have a significant economic impact. ARK Invest, an investment firm based in New York, predicts that in 20 years, deep learning will create a $17 trillion market opportunity.* That is bigger than the economic impact that the internet had.

You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!