Skip to content
gqlxj1987's Blog
Go back

Automated ML Hyperparameter Tuning

Edit page

原文链接

a complete example of Bayesian hyperparameter tuning of a gradient boosting machine using the Hyperopt library

Bayesian Optimization Methods

Bayesian methods differ from random or grid search in that they use past evaluation results to choose the next values to evaluate.

limit expensive evaluations of the objective function by choosing the next input values based on those that have done well in the past

Four Parts of Optimization Problem

  1. Objective Function: what we want to minimize, in this case the validation error of a machine learning model with respect to the hyperparameters
  2. Domain Space: hyperparametervalues to search over
  3. Optimization algorithm: method for constructing the surrogate model and choosing the next hyperparameter values to evaluate
  4. Result history: stored outcomes from evaluations of the objective function consisting of the hyperparameters and validation loss

rather than separating training data into a distinct validation set, we use KFold cross validation

import lightgbm as lgb
from hyperopt import STATUS_OK

N_FOLDS = 10

# Create the dataset
train_set = lgb.Dataset(train_features, train_labels)

def objective(params, n_folds = N_FOLDS):
    """Objective function for Gradient Boosting Machine Hyperparameter Tuning"""
    
    # Perform n_fold cross validation with hyperparameters
    # Use early stopping and evalute based on ROC AUC
    cv_results = lgb.cv(params, train_set, nfold = n_folds, num_boost_round = 10000, 
                        early_stopping_rounds = 100, metrics = 'auc', seed = 50)
  
    # Extract the best score
    best_score = max(cv_results['auc-mean'])
    
    # Loss must be minimized
    loss = 1 - best_score
    
    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'status': STATUS_OK}

we are using early_stopping_roundsto stop the training when validation scores have not improved for 100 estimators

Domain Space

In Bayesian optimization the idea is the same except this space has probability distributions for each hyperparameter rather than discrete values.

# Define the search space
space = {
    'class_weight': hp.choice('class_weight', [None, 'balanced']),
    'boosting_type': hp.choice('boosting_type', 
                               [{'boosting_type': 'gbdt', 
                                    'subsample': hp.uniform('gdbt_subsample', 0.5, 1)}, 
                                 {'boosting_type': 'dart', 
                                     'subsample': hp.uniform('dart_subsample', 0.5, 1)},
                                 {'boosting_type': 'goss'}]),
    'num_leaves': hp.quniform('num_leaves', 30, 150, 1),
    'learning_rate': hp.loguniform('learning_rate', np.log(0.01), np.log(0.2)),
    'subsample_for_bin': hp.quniform('subsample_for_bin', 20000, 300000, 20000),
    'min_child_samples': hp.quniform('min_child_samples', 20, 500, 5),
    'reg_alpha': hp.uniform('reg_alpha', 0.0, 1.0),
    'reg_lambda': hp.uniform('reg_lambda', 0.0, 1.0),
    'colsample_bytree': hp.uniform('colsample_by_tree', 0.6, 1.0)
}

Optimization Algorithm

use the Tree Parzen Estimator

Result History


Edit page
Share this post on:

Previous Post
Istio Intro
Next Post
Java Concurrency