Automated ML Hyperparameter Tuning

a complete example of Bayesian hyperparameter tuning of a gradient boosting machine using the Hyperopt library

Bayesian Optimization Methods

Bayesian methods differ from random or grid search in that they use past evaluation results to choose the next values to evaluate.

limit expensive evaluations of the objective function by choosing the next input values based on those that have done well in the past

Four Parts of Optimization Problem

Objective Function: what we want to minimize, in this case the validation error of a machine learning model with respect to the hyperparameters
Domain Space: hyperparametervalues to search over
Optimization algorithm: method for constructing the surrogate model and choosing the next hyperparameter values to evaluate
Result history: stored outcomes from evaluations of the objective function consisting of the hyperparameters and validation loss

rather than separating training data into a distinct validation set, we use KFold cross validation

import lightgbm as lgb
from hyperopt import STATUS_OK

N_FOLDS = 10

# Create the dataset
train_set = lgb.Dataset(train_features, train_labels)

def objective(params, n_folds = N_FOLDS):
    """Objective function for Gradient Boosting Machine Hyperparameter Tuning"""

    # Perform n_fold cross validation with hyperparameters
    # Use early stopping and evalute based on ROC AUC
    cv_results = lgb.cv(params, train_set, nfold = n_folds, num_boost_round = 10000,
                        early_stopping_rounds = 100, metrics = 'auc', seed = 50)

    # Extract the best score
    best_score = max(cv_results['auc-mean'])

    # Loss must be minimized
    loss = 1 - best_score

    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'status': STATUS_OK}

we are using early_stopping_roundsto stop the training when validation scores have not improved for 100 estimators

Domain Space

In Bayesian optimization the idea is the same except this space has probability distributions for each hyperparameter rather than discrete values.

# Define the search space
space = {
    'class_weight': hp.choice('class_weight', [None, 'balanced']),
    'boosting_type': hp.choice('boosting_type',
                               [{'boosting_type': 'gbdt',
                                    'subsample': hp.uniform('gdbt_subsample', 0.5, 1)},
                                 {'boosting_type': 'dart',
                                     'subsample': hp.uniform('dart_subsample', 0.5, 1)},
                                 {'boosting_type': 'goss'}]),
    'num_leaves': hp.quniform('num_leaves', 30, 150, 1),
    'learning_rate': hp.loguniform('learning_rate', np.log(0.01), np.log(0.2)),
    'subsample_for_bin': hp.quniform('subsample_for_bin', 20000, 300000, 20000),
    'min_child_samples': hp.quniform('min_child_samples', 20, 500, 5),
    'reg_alpha': hp.uniform('reg_alpha', 0.0, 1.0),
    'reg_lambda': hp.uniform('reg_lambda', 0.0, 1.0),
    'colsample_bytree': hp.uniform('colsample_by_tree', 0.6, 1.0)
}

choice: categorical variables
quniform: discrete uniform (integers spaced evenly)
uniform: continuous uniform (floats spaced evenly)
loguniform: continuous log uniform (floats spaced evenly on a log scale)

Optimization Algorithm

use the Tree Parzen Estimator

Bayesian Optimization Methods

Four Parts of Optimization Problem

Domain Space

Optimization Algorithm

Result History