You’re reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.

Most of the time, researchers can use the data given to them directly, but there are many instances where the data is not clean. That means it cannot be used directly to train the neural network because it contains data that is not representative of what the algorithm wants to classify. Perhaps it contains bad data, like black-and-white images when you want to create a neural network to locate cats in colored images. Another problem is when the data is not appropriate. For example, when you want to classify images of people as male or female. There might be pictures without the needed tag or pictures that have the information corrupted with misspelled words like β€œale” instead of β€œmale.” Even though these might seem like crazy scenarios, they happen all the time. Handling these problems and cleaning up the data is known as data wrangling.

Your comments and feedback help improve this resource. Comments are reviewed by editors and may be published for all readers or incorporated into future updates.
Share this discussion