Osprey: Hyperparameter Optimization for Machine Learning

Abstract

Osprey: Hyperparameter Optimization for Machine Learning

Publication
Journal of Open Source Software
Date
Links

Osprey is a tool for hyperparameter optimization of machine learning algorithms in Python. Hyperparameter optimization can often be an onerous process for researchers, due to time-consuming experimental replicates, non-convex objective functions, and constant tension between exploration of global parameter space and local optimization (Jones, Schonlau, and Welch 1998). We’ve designed Osprey to provide scientists with a practical, easy-to-use way of finding optimal model parameters. The software works seamlessly with scikit-learn estimators (Pedregosa et al. 2011) and supports many different search strategies for choosing the next set of parameters with which to evaluate a given model, including gaussian processes (GPy 2012), tree-structured Parzen estimators (Yamins, Tax, and Bergstra 2013), as well as random and grid search. As hyperparameter optimization is an embarrassingly parallel problem, Osprey can easily scale to hundreds of concurrent processes by executing a simple command-line program multiple times. This makes it easy to exploit large resources available in high-performance computing environments.

Osprey is actively maintained by researchers at Stanford University and other institutions around the world. While originally developed to analyze computational protein dynamics (McGibbon, Harrigan, et al. 2016), it is applicable to any scikit-learn-compatible pipeline. The source code for Osprey is hosted on GitHub and has been archived to Zenodo (McGibbon, Hernández, et al. 2016). Full documentation can be found at http://msmbuilder.org/osprey.