What is Deep Learning?
Like me, you may have started to see a larger number of mentions of terms like "Deep Learning", "Machine Learning" and "Artificial Intelligence" over the past few years. The last year in particular, has witnessed a pretty profound acceleration of new releases and breakthroughs in the space.
While visually interesting, what is actually happening behind the scenes and what's powering most of these discoveries? Well - there are several things at play here, but one of the central themes driving this is "Deep Learning". The goal of this article will be to discuss three main questions:
- What is deep learning, and how is it different from machine learning?
- How does deep learning work?
- What is driving the massive increase in Deep Learning usage in recent years?

What is deep learning, and how is it different from machine learning?
The key concept to understand is that Deep Learning, Artificial Intelligence and all other labels you may hear are all a subset of a larger category called Machine Learning. Machine Learning itself has a long history, but one of the first uses of the term was in a paper by Arthur L. Samuel called "Some Studies in Machine Learning Using the Game of Checkers". The terminology first used in this paper was the following:
Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program.
The key takeaway here is that Machine Learning works to understand how to automate intellectual tasks that are normally performed by humans. There are historically two main ways this has been accomplished:
- Symbolic AI - Programmers write explicit rules
- Non-Symbolic AI - Programmers structures a program to self-learn and automate the estimation of an output given a set of inputs and parameters.
Hence, Deep Learning is a subset of Artificial Intelligence, which itself is a subset of Machine Learning. Similar to Machine Learning, Deep Learning also has the three main components that every machine learning model has:
- Inputs -> A set of points that define something an operator wishes to predict the output of
- Outputs -> A set of validated outcomes that have a relationship with the inputs, that the operator wishes to explore
- Measure of Success -> A method of discerning between multiple different input <> output relationships, and ranking the different examples
In this case, the term "deep" refers to how Deep Learning models have successive layers of representation. The unique aspect here is that Deep Learning can learn all the layers of representation at once (this differs from other machine learning models where this is not done at once).
How Does Deep Learning Work
First, every deep learning model requires three things:
- Inputs
- Outputs
- Measure of success
Like all machine learning models, Deep Learning uses these three things to build rules for a data processign task. The central problem however, is how to meaningfully transform data into useful representations (which is a different way of looking at data that matches the best approximation of the data processing task required). There are multiple examples of transformations, some of the most used ones include:
- Coordiante changes
- Histogram of pixels
- Linear Projections
- Translations
- Non-Linear operations
In this case, Deep Learning is creating successive layers of useful representations (including some of the ones listed above!). These models can have 1000s of different layers, hence the naming terminology "Deep Learning". Similar to air filters, these successive layers take a set of disordered data and order it through each of the different representations.
Each of the layers can have weights, which are a set of numbers that parameterize each of the layers (i.e how much do we project a set of data linearly). However, finding the set of parameters can take a long time if done by hand, hence the machine learning model automates this using two things:
- A loss function - a measure of how accurate / not-accurate a prediction is relative to the correct result
- An optimizer - a function that uses the loss function above to layer weights
The key trick to machine learning are these two components. Your loss function is the "score" at the end of every model approximation, which tells us how good / bad our approximation was. This is then used by the optimizer to gradually and incrementally adjust the layer weights. Here the optimizer uses an algorithm which is called the "Backpropagation Algorithm". Every model usually starts with random values, and then after successive rounds of this process, with ordered data. This process is repeated until we reach the points of marginal return, at which point the model is considered "trained".
What is driving the massive increase in Deep Learning usage in recent years?
There have been two main reasons why deep learning has taken off considerably in recent years:
- The rise of more advanced hardware was a nice compliment to the needs Deep Learning had to scale - as it made problem solving easier, but required more power
- It automated feature engineering
For feature engineering, this was a significant improvement from earlier models. Feature engineering consisted of making input data more amenable to processing. This often took considerable time, and automating this step is a big reason why Deep Learning has grown in popularity.
In addition, Deep Learning acts "greedily" which effectively means it allows for the model to learn all layers of representation at once. This is done due to joint feature learning. In this case the model responds dynamically once any of its internal features are adjusted - which subsequently adjusts all dependent features.
This - along with all the benefits of improved performance as a result - are a big reason why deep learning has seen considerable improvements in recent years