Breaking down the GBM algorithm with simple explanation
Motivation
GBM is a supervised ensembling algorithm. I know initially many of us are confused to picturize with the way GBM works and would have spend lots of hour on internet to decipher the working principle.
Here I have made a small attempt to break it down.
Please note that I will not be covering theoretical aspect of GBM (for now) since there are sufficient blog available.
Let’s get started.
Brief
Ensembling is a technique to combine several weak learner to create strong learner for better performance than single model.
Audience poll would be one of better analogy to view this.
- Bagging is a simple ensembling technique in which we build many independent predictors/models/learners and combine them using some model averaging techniques. (e.g. weighted average, majority vote or normal average)
- Boosting is an ensembling technique in which the predictors are not made independently, but sequentially by reducing residuals error b/w each models.
Since there are many theoretical explanation on the web and my focus would be more on practical understanding of working principle behind GBM. Let’s get started.
Gradient boosting algorithm
Building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.
Here we will be discussing regression problem below :
GBM will use below equation to build the model (Don’t be scared 😱) :
It is not that complex as it looks. Let’s break it down :
For simplicity, I have made excel computation for each tree and please go through how the model flows from Tree 1 to Tree 3.
This gives you a clear understanding of the working principle and I have limit the explanation to only 3 trees.
You can notice that the error residuals are reducing between one tree to another tree model.