This is an introduction article to Machine Learning, and its various concepts, that even a high-school student should be able to understand. We are going to play with blocks and through that game understand how Machine Learning works.
The big news nowadays is Machine Learning (ML). But what is it? The machine part is obvious – it’s the computer that does the learning. But the learning part may appear as tricky, even though in reality it’s a very simple thing in its foundation. Let’s gain an intuitive understand of the learning in ML.
Let’s play a game
I don’t know if you have ever played a computer game where you have a ball at a starting position, and you have to direct it to a target with the help of blocks, springs, rotating gears, etc. The original game was named The Incredible Machine. It usually relies on the laws of physics and you need to re-arrange those helper components to make the ball move from let’s say the left top corner of the screen to the right bottom of the screen where some sort of a target is located. The way you play the game is by experimenting with the block arrangement, so that they will do the work of guiding the ball to its target.
Here is a possible initial game board setup:
The purple ball needs to hit the purple hexagon for you to win.
Normally, with the help of gravity, the ball just falls straight down. But since we have the helper blocks, we will use them to guide the ball so that once it starts falling it’ll have a trajectory to hit the target.
Now, I hope that it will hit the target.
But no, I slightly misplaced the “plank”-shaped block and the ball will get stuck in the gap.
OK, let’s move the bar a bit to the left:
Hooray, this time, the ball should hit the target.
Next, the initial ball position is moved a bit to the left.
And our helper block arrangement is no longer helping to guide the ball to the target. The ball again falls straight down.
OK, let’s rearrange some blocks:
And success! Now this starting position of the ball is covered too.
But the previous one (the ball starting from a different position) will no longer work. So, we have more trial and error re-arrangements to do.
After some lengthy experimentation we find the following arrangement:
And, voila, almost any starting position of the ball will now lead it to the target correctly.
Let’s revisit the stages of the game we have just played.
- The first time I tried to arrange the blocks, I under-shot and missed the target. (You can’t tell from the picture how heavy the ball is or how much friction the helper blocks generate, so while it is looking obvious in my drawing, in the actual game it could be quite challenging to find the right setup).
- Then I started experimenting by moving the helper blocks a bit and re-running the ball, continually making small corrections until the target was reached.
- Then a new situation was given and I through trial and error I made it work.
- Then I tried to arrange the blocks so that both situations would work.
- Finally, I tried to generalize so that numerous possible situations would work.
If you understood how the game was played, you already understands the learning part in the Machine Learning. This is how the ML model learns. It usually starts with a random arrangement of the helper blocks (hidden weights) and tests whether it hits the target (e.g. a specific category). If it didn’t succeed, it tries to make small adjustments, while measuring whether the ball is coming closer to the target or not (often using the method of gradient descent). Overtime, the adjustments become better and better, until it actually hits the target. It is possible for the model to overshoot the target, in which case it turns around and takes a few small steps in the opposite direction until the target is hit. This whole process is called training.
The state where the ball hasn’t made to the target for most ball starting positions yet, is called underfitting, since it can clearly still improve.
The more different positions we train the model with, the better it will generalize for positions it hasn’t yet seen. Assuming that we have an infinite number of helper blocks of different shapes we can probably find a solution for almost every situation.
If we try to fit perfectly all the situations we have seen so far, we may end up with a perfect outcome for every seen situation, but if a new situation is given it may not work. This is called overfitting.
And the final stage of the game was to make our setup generalize, so that it could fit almost any unseen situation. In practice, the generalization is an ongoing process, rather than a final stage. For example, by randomly removing the helper blocks at random situations, we can force our block arrangement to be more robust to unseen situations. This is called dropout in ML lingo.
Usually, we don’t have an infinite number of helper blocks and so we have to make do with what we have. So the number of block arrangement possibilities is limited. And, say, after 10 different ball position starting points, our model knows how to send the ball to its destination 90% of the time, which is already quite excellent.
Now you no longer need to spend hours moving the blocks around, since the computer will do it for you. This is no longer fun, since it’s the computer that’s now playing the game for you. But for the sake of better understanding of how Machine Learning works we will allow it to do it for us this time.
Next comes the concept of Deep Learning, where you give the ML model a lot more helper blocks, allowing it to build much more elaborate setups, which usually we don’t even understand how they were made to work. And then the process is repeated with hundreds and thousands of different starting combinations and the model learns with, often, close to 100% correct solutions.
The deep part just indicates that instead of a simple basic setup with a few blocks, now we have a much more complex setup that has a much higher capacity for flexibility of arranging things and generalizing. For example, for the sake of our mechanical universe example, imagine that re-arrange our helper blocks so that they represent a concave surface like a bowl or a wine glass, and the target is at the bottom of it. If this was possible, no matter where the ball is released from it would always ends up at the bottom of that concave surface where our target is situated.
If you’re ready to imagine even more complex situations, consider multiple dimensions, so if it looks impossible to make the ball move from say the bottom of the board to the top, we use a forth dimension to sneak it in. In reality deep learning uses hundreds and thousand of dimensions to solve very tricky set ups. We can hardly visualize the 4th dimension, so we just have to trust that it works, relying primarily on math.
Further, to make things even more efficient, instead of trying to figure out one ball situation at a time, the model tries to work simultaneously on batches of dozens or hundreds of such situations at once. This is due to availability of specialized hardware (called GPU or TPU), which was designed to process huge amounts of data in parallel at an incredible speed. Not only that hardware makes things run much faster, it also finds the best generalized block arrangement in less steps, as compared to doing it one situation at a time.
While the foundations of ML are very simple, for it work successfully and in a timely manner we need:
- Either huge amount of data to train it on – think real estate prices for the last 5 years over a huge territory, which is often needed for supervised learning, where the data helps the model to improve. Or the model can be trained using reinforcement learning, like in our ball to target game example, where through trial and error it learns how to do better over time.
- Very powerful hardware, usually specially designed for heavy matrix processing, to support deep learning. Imagine, how much more complicated it would be for you to play this game if you had to accomplish the same not with a few helper blocks, but millions of blocks.
If these two requirements are satisfied then Deep Learning is possible. The remaining difficult part is usually to build or choose an architecture that solves a problem at hand.
Now, if you were to start playing the arrange-the-blocks game from scratch you’d probably have to invest as much work as I did to make it deal with multiple starting situations. What if I were to save my work and share it with you? Then you won’t have to start from scratch and will be able to continue working on more complex situations, saving yourself a huge amount of time and computer resources.
This brings us to Transfer Learning, where an individual or an organization invests a huge amount of their time and computer resources (i.e. money) to build a ML model, which can then be shared with others.
This shared model is then fine-tuned to a specific type of data. For example imagine you want to replace the falling ball with some kind of polygon, that can roll like a ball, but it doesn’t roll as well. You will still benefit from the pre-trained model, and then you will make small adjustments to the block arrangement by retraining the existing model that was shared with you for your specific needs.
Often, transfer learning can provide huge savings of time and money.
Deep Learning models often figure out very subtle correlations between input signals, that we humans are either not likely to notice or have the capacity to do so, but since ML models are forced to generalize to give correct answers, at times the answers are given for totally wrong or unexplainable reasons.
There is the urban legend, that once upon a time a certain military force tried to build a ML model to detect camouflaged tanks in satellite imagery. A ML model was trained with a handful of photos that included tanks (positive examples) and about the same amount with no tanks in them (negative examples). After training on these photos, the model was able to classify new photos correctly, except instead of learning to tell hidden tanks, it learned to tell whether it was a cloudy day or not. Since it so happened that all camouflaged tanks photos were taken on the cloudy days, and all the non-tank photos were taken on non-cloudy days.
I found an article that attempts to find the truth about whether there is any truth to this legend. I will let you discover it for yourself, but the article concludes with:
So I think it’s very likely, though not certain, that this didn’t actually happen.
Regardless, there are plenty of research papers out there that do indicate real findings of this type and this a big problem, since if we don’t know how a model makes its decisions, we will be unable to use it reliably. If some kitten photos gets miscategorized, it’s probably not a big deal, but if a self-driving car miscategorizes an obstacle and we can’t figure out why, or a person gets put in jail due to a ML model mistake, that would be a big problem.
I hope this little playful introduction helped you to gain an insight into how the very complicated field of Machine Learning is based on rather simple things. The complexity is in the details of finding the right solution for the right situation, knowing how to process data, how to debug problems in the code, optimize the code to work faster, etc. The devil is in the detail. But I trust you now have an intuitive understanding of how the gears of Machine Learning work and in particular the Learning part of it.