Optimization Sub-API¶

Limbo uses optimizers in several situations, most notably to optimize hyper-parameters of Gaussian processes and to optimize acquisition functions. Nevertheless, these optimizers might be useful in other contexts. This tutorial briefly explains how to use it.

Optimizers in Limbo are wrappers around: - NLOpt (which provides many local, global, gradient-based, gradient-free algorithms) - libcmaes (which provides the Covariance Matrix Adaptation Evolutionary Strategy, that is, CMA-ES) - a few other algorithms that are implemented in Limbo (in particular, RPROP, which is gradient-based optimization algorithm)

We first need to define a function to be optimized. Here we chose \(-(x_1-0.5)^2 - (x_2-0.5)^2\), whose maximum is [0.5, 0.5]:

// the maximum is [0.5, 0.5] (f([0.5, 0.5] = 0))
opt::eval_t my_function(const Eigen::VectorXd& params, bool eval_grad = false)
{
    double v = -(params.array() - 0.5).square().sum();
    if (!eval_grad)
        return opt::no_grad(v);
    Eigen::VectorXd grad = (-2 * params).array() + 1.0;
    return {v, grad};
}

Warning

Limbo optimizers always MAXIMIZE f(x), whereas many libraries MINIMIZE f(x)

The first thing to note is that the functions needs to return an object of type eval_t, which is actually a pair made of a double (\(f(x)\)) and a vector (the gradient). We need to do so because (1) many fast algorithm use the gradient, (2) the gradient and the function often share some computations, therefore it is often faster to compute both the function value and the gradient at the same time (this is, for instance, the case with the log-likelihood that we optimize to find the hyper-parameters of Gaussian processes).

Thanks to c++11, we can simply return {v, grad} and an object of type eval_t will be created. When we do not know how to compute the gradient, we return opt::no_grad(v), which creates a special object without the gradient information (using boost::optional).

The boolean eval_grad is true when we need to evaluate the gradient for x, and false otherwise. This is useful because some algorithms do not need the gradient: there is no need to compute this value.

As usual, each algorithm has some parameters (typically the number of iterations to perform). They are defined like the other parameters in Limbo (see Parameters):

#ifdef USE_NLOPT
struct ParamsGrad {
    struct opt_nloptgrad {
        BO_PARAM(int, iterations, 80);
    };
};

struct ParamsNoGrad {
    struct opt_nloptnograd {
        BO_PARAM(int, iterations, 80);
    };
};
#endif

#ifdef USE_LIBCMAES
struct ParamsCMAES {
    struct opt_cmaes : public defaults::opt_cmaes {
    };
};
#endif

Now we can instantiate our optimizer and call it:

    // the type of the optimizer (here NLOpt with the LN_LBGFGS algorithm)
    opt::NLOptGrad<ParamsGrad, nlopt::LD_LBFGS> lbfgs;
    // we start from a random point (in 2D), and the search is not bounded
    Eigen::VectorXd res_lbfgs = lbfgs(my_function, tools::random_vector(2), false);
    std::cout << "Result with LBFGS:\t" << res_lbfgs.transpose()

We can do the same with a gradient-free optimizer from NLOpt:

    // we can also use a gradient-free algorith, like DIRECT
    opt::NLOptNoGrad<ParamsNoGrad, nlopt::GN_DIRECT> direct;
    // we start from a random point (in 2D), and the search is bounded in [0,1]
    // be careful that DIRECT does not support unbounded search
    Eigen::VectorXd res_direct = direct(my_function, tools::random_vector(2), true);
    std::cout << "Result with DIRECT:\t" << res_direct.transpose()

Or with CMA-ES:

    // or Cmaes
    opt::Cmaes<ParamsCMAES> cmaes;
    Eigen::VectorXd res_cmaes = cmaes(my_function, tools::random_vector(2), false);
    std::cout << "Result with CMA-ES:\t" << res_cmaes.transpose()

See the API documentation for more details.

Here is the full file.

#include <limbo/opt.hpp>
#include <limbo/tools.hpp>

// this short tutorial shows how to use the optimization api of limbo (opt::)
using namespace limbo;

#ifdef USE_NLOPT
struct ParamsGrad {
    struct opt_nloptgrad {
        BO_PARAM(int, iterations, 80);
    };
};

struct ParamsNoGrad {
    struct opt_nloptnograd {
        BO_PARAM(int, iterations, 80);
    };
};
#endif

#ifdef USE_LIBCMAES
struct ParamsCMAES {
    struct opt_cmaes : public defaults::opt_cmaes {
    };
};
#endif

// we maximize -(x_1-0.5)^2 - (x_2-0.5)^2
// the maximum is [0.5, 0.5] (f([0.5, 0.5] = 0))
opt::eval_t my_function(const Eigen::VectorXd& params, bool eval_grad = false)
{
    double v = -(params.array() - 0.5).square().sum();
    if (!eval_grad)
        return opt::no_grad(v);
    Eigen::VectorXd grad = (-2 * params).array() + 1.0;
    return {v, grad};
}

int main(int argc, char** argv)
{
#ifdef USE_NLOPT
    // the type of the optimizer (here NLOpt with the LN_LBGFGS algorithm)
    opt::NLOptGrad<ParamsGrad, nlopt::LD_LBFGS> lbfgs;
    // we start from a random point (in 2D), and the search is not bounded
    Eigen::VectorXd res_lbfgs = lbfgs(my_function, tools::random_vector(2), false);
    std::cout << "Result with LBFGS:\t" << res_lbfgs.transpose()
              << " -> " << my_function(res_lbfgs).first << std::endl;

    // we can also use a gradient-free algorith, like DIRECT
    opt::NLOptNoGrad<ParamsNoGrad, nlopt::GN_DIRECT> direct;
    // we start from a random point (in 2D), and the search is bounded in [0,1]
    // be careful that DIRECT does not support unbounded search
    Eigen::VectorXd res_direct = direct(my_function, tools::random_vector(2), true);
    std::cout << "Result with DIRECT:\t" << res_direct.transpose()
              << " -> " << my_function(res_direct).first << std::endl;

#endif

#ifdef USE_LIBCMAES
    // or Cmaes
    opt::Cmaes<ParamsCMAES> cmaes;
    Eigen::VectorXd res_cmaes = cmaes(my_function, tools::random_vector(2), false);
    std::cout << "Result with CMA-ES:\t" << res_cmaes.transpose()
              << " -> " << my_function(res_cmaes).first << std::endl;

#endif

    return 0;
}