a complete example of Bayesian hyperparameter tuning of a gradient boosting machine using the Hyperopt library
Bayesian Optimization Methods
Bayesian methods differ from random or grid search in that they use past evaluation results to choose the next values to evaluate.
limit expensive evaluations of the objective function by choosing the next input values based on those that have done well in the past
Four Parts of Optimization Problem
- Objective Function: what we want to minimize, in this case the validation error of a machine learning model with respect to the hyperparameters
- Domain Space: hyperparametervalues to search over
- Optimization algorithm: method for constructing the surrogate model and choosing the next hyperparameter values to evaluate
- Result history: stored outcomes from evaluations of the objective function consisting of the hyperparameters and validation loss
rather than separating training data into a distinct validation set, we use KFold cross validation
1 | import lightgbm as lgb |
we are using early_stopping_rounds
to stop the training when validation scores have not improved for 100 estimators
Domain Space
In Bayesian optimization the idea is the same except this space has probability distributions for each hyperparameter rather than discrete values.
1 | # Define the search space |
choice
: categorical variablesquniform
: discrete uniform (integers spaced evenly)uniform
: continuous uniform (floats spaced evenly)loguniform
: continuous log uniform (floats spaced evenly on a log scale)
Optimization Algorithm
use the Tree Parzen Estimator