!pip install --quiet optuna
A Quick Introduction to Optuna
This Jupyter notebook goes through the basic usage of Optuna.
- Install Optuna
- Write a training algorithm that involves hyperparameters
- Read train/valid data
- Define and train model
- Evaluate model
- Use Optuna to tune the hyperparameters (hyperparameter optimization, HPO)
- Visualize HPO
Install optuna
Optuna can be installed via pip
or conda
.
import optuna
optuna.__version__
Optimize Hyperparameters
Define a simple scikit-learn model
We start with a simple random forest model to classify flowers in the Iris dataset. We define a function called objective
that encapsulates the whole training process and outputs the accuracy of the model.
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
def objective():
= sklearn.datasets.load_iris() # Prepare the data.
iris
= sklearn.ensemble.RandomForestClassifier(
clf =5, max_depth=3) # Define the model.
n_estimators
return sklearn.model_selection.cross_val_score(
=-1, cv=3).mean() # Train and evaluate the model.
clf, iris.data, iris.target, n_jobs
print('Accuracy: {}'.format(objective()))
Optimize hyperparameters of the model
The hyperparameters of the above algorithm are n_estimators
and max_depth
for which we can try different values to see if the model accuracy can be improved. The objective
function is modified to accept a trial object. This trial has several methods for sampling hyperparameters. We create a study to run the hyperparameter optimization and finally read the best hyperparameters.
import optuna
def objective(trial):
= sklearn.datasets.load_iris()
iris
= trial.suggest_int('n_estimators', 2, 20)
n_estimators = int(trial.suggest_float('max_depth', 1, 32, log=True))
max_depth
= sklearn.ensemble.RandomForestClassifier(
clf =n_estimators, max_depth=max_depth)
n_estimators
return sklearn.model_selection.cross_val_score(
=-1, cv=3).mean()
clf, iris.data, iris.target, n_jobs
= optuna.create_study(direction='maximize')
study =100)
study.optimize(objective, n_trials
= study.best_trial
trial
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))
It is possible to condition hyperparameters using Python if
statements. We can for instance include another classifier, a support vector machine, in our HPO and define hyperparameters specific to the random forest model and the support vector machine.
import sklearn.svm
def objective(trial):
= sklearn.datasets.load_iris()
iris
= trial.suggest_categorical('classifier', ['RandomForest', 'SVC'])
classifier
if classifier == 'RandomForest':
= trial.suggest_int('n_estimators', 2, 20)
n_estimators = int(trial.suggest_float('max_depth', 1, 32, log=True))
max_depth
= sklearn.ensemble.RandomForestClassifier(
clf =n_estimators, max_depth=max_depth)
n_estimatorselse:
= trial.suggest_float('svc_c', 1e-10, 1e10, log=True)
c
= sklearn.svm.SVC(C=c, gamma='auto')
clf
return sklearn.model_selection.cross_val_score(
=-1, cv=3).mean()
clf, iris.data, iris.target, n_jobs
= optuna.create_study(direction='maximize')
study =100)
study.optimize(objective, n_trials
= study.best_trial
trial
print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))
Plotting the study
Plotting the optimization history of the study.
optuna.visualization.plot_optimization_history(study)
Plotting the accuracies for each hyperparameter for each trial.
optuna.visualization.plot_slice(study)
Plotting the accuracy surface for the hyperparameters involved in the random forest model.
=['n_estimators', 'max_depth']) optuna.visualization.plot_contour(study, params