DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

Automated ML Hyperparameter Tuning

原文链接

a complete example of Bayesian hyperparameter tuning of a gradient boosting machine using the Hyperopt library

Bayesian Optimization Methods

Bayesian methods differ from random or grid search in that they use past evaluation results to choose the next values to evaluate.

limit expensive evaluations of the objective function by choosing the next input values based on those that have done well in the past

Four Parts of Optimization Problem

  1. Objective Function: what we want to minimize, in this case the validation error of a machine learning model with respect to the hyperparameters
  2. Domain Space: hyperparametervalues to search over
  3. Optimization algorithm: method for constructing the surrogate model and choosing the next hyperparameter values to evaluate
  4. Result history: stored outcomes from evaluations of the objective function consisting of the hyperparameters and validation loss

rather than separating training data into a distinct validation set, we use KFold cross validation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import lightgbm as lgb
from hyperopt import STATUS_OK

N_FOLDS = 10

# Create the dataset
train_set = lgb.Dataset(train_features, train_labels)

def objective(params, n_folds = N_FOLDS):
"""Objective function for Gradient Boosting Machine Hyperparameter Tuning"""

# Perform n_fold cross validation with hyperparameters
# Use early stopping and evalute based on ROC AUC
cv_results = lgb.cv(params, train_set, nfold = n_folds, num_boost_round = 10000,
early_stopping_rounds = 100, metrics = 'auc', seed = 50)

# Extract the best score
best_score = max(cv_results['auc-mean'])

# Loss must be minimized
loss = 1 - best_score

# Dictionary with information for evaluation
return {'loss': loss, 'params': params, 'status': STATUS_OK}

we are using early_stopping_roundsto stop the training when validation scores have not improved for 100 estimators

Domain Space

In Bayesian optimization the idea is the same except this space has probability distributions for each hyperparameter rather than discrete values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Define the search space
space = {
'class_weight': hp.choice('class_weight', [None, 'balanced']),
'boosting_type': hp.choice('boosting_type',
[{'boosting_type': 'gbdt',
'subsample': hp.uniform('gdbt_subsample', 0.5, 1)},
{'boosting_type': 'dart',
'subsample': hp.uniform('dart_subsample', 0.5, 1)},
{'boosting_type': 'goss'}]),
'num_leaves': hp.quniform('num_leaves', 30, 150, 1),
'learning_rate': hp.loguniform('learning_rate', np.log(0.01), np.log(0.2)),
'subsample_for_bin': hp.quniform('subsample_for_bin', 20000, 300000, 20000),
'min_child_samples': hp.quniform('min_child_samples', 20, 500, 5),
'reg_alpha': hp.uniform('reg_alpha', 0.0, 1.0),
'reg_lambda': hp.uniform('reg_lambda', 0.0, 1.0),
'colsample_bytree': hp.uniform('colsample_by_tree', 0.6, 1.0)
}
  • choice: categorical variables
  • quniform: discrete uniform (integers spaced evenly)
  • uniform: continuous uniform (floats spaced evenly)
  • loguniform: continuous log uniform (floats spaced evenly on a log scale)

Optimization Algorithm

use the Tree Parzen Estimator

Result History