!pip install --quiet optuna
A Quick Introduction to Optuna
This Jupyter notebook goes through the basic usage of Optuna.
- Install Optuna
- Write a training algorithm that involves hyperparameters
- Read train/valid data
- Define and train model
- Evaluate model
- Use Optuna to tune the hyperparameters (hyperparameter optimization, HPO)
- Visualize HPO
Install optuna
Optuna can be installed via pip
or conda
import optuna
Optimize Hyperparameters
Define a simple scikit-learn model
We start with a simple random forest model to classify flowers in the Iris dataset. We define a function called objective
that encapsulates the whole training process and outputs the accuracy of the model.
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
def objective():
= sklearn.datasets.load_iris() # Prepare the data.
= sklearn.ensemble.RandomForestClassifier(
clf =5, max_depth=3) # Define the model.
return sklearn.model_selection.cross_val_score(
=-1, cv=3).mean() # Train and evaluate the model.
clf, iris.data, iris.target, n_jobs
print('Accuracy: {}'.format(objective()))
Optimize hyperparameters of the model
The hyperparameters of the above algorithm are n_estimators
and max_depth
for which we can try different values to see if the model accuracy can be improved. The objective
function is modified to accept a trial object. This trial has several methods for sampling hyperparameters. We create a study to run the hyperparameter optimization and finally read the best hyperparameters.
import optuna
def objective(trial):
= sklearn.datasets.load_iris()
= trial.suggest_int('n_estimators', 2, 20)
n_estimators = int(trial.suggest_float('max_depth', 1, 32, log=True))
= sklearn.ensemble.RandomForestClassifier(
clf =n_estimators, max_depth=max_depth)
return sklearn.model_selection.cross_val_score(
=-1, cv=3).mean()
clf, iris.data, iris.target, n_jobs
= optuna.create_study(direction='maximize')
study =100)
study.optimize(objective, n_trials
= study.best_trial
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))
It is possible to condition hyperparameters using Python if
statements. We can for instance include another classifier, a support vector machine, in our HPO and define hyperparameters specific to the random forest model and the support vector machine.
import sklearn.svm
def objective(trial):
= sklearn.datasets.load_iris()
= trial.suggest_categorical('classifier', ['RandomForest', 'SVC'])
if classifier == 'RandomForest':
= trial.suggest_int('n_estimators', 2, 20)
n_estimators = int(trial.suggest_float('max_depth', 1, 32, log=True))
= sklearn.ensemble.RandomForestClassifier(
clf =n_estimators, max_depth=max_depth)
= trial.suggest_float('svc_c', 1e-10, 1e10, log=True)
= sklearn.svm.SVC(C=c, gamma='auto')
return sklearn.model_selection.cross_val_score(
=-1, cv=3).mean()
clf, iris.data, iris.target, n_jobs
= optuna.create_study(direction='maximize')
study =100)
study.optimize(objective, n_trials
= study.best_trial
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))
Plotting the study
Plotting the optimization history of the study.
Plotting the accuracies for each hyperparameter for each trial.
Plotting the accuracy surface for the hyperparameters involved in the random forest model.
=['n_estimators', 'max_depth']) optuna.visualization.plot_contour(study, params