Advanced Example¶

Problem Statement¶

Let’s say we have a planar 6-DOF arm manipulator and we want its end-effector to reach a target position (i.e. catch an object). Now, imagine that a joint is broken (or not working properly). We want to use Bayesian Optimization to learn a compensatory behavior that will allow our arm to reach the target in spite of damage. Moreover, once we have learned how to reach one target, we want to reach another (different) one. We will use state-based bayesian optimization to transfer knowledge between the two tasks. We will use the following:

The forward kinematic model as prior knowledge (mean of Gaussian Process).
The Squared exponential covariance function with automatic relevance detection as the kernel function of the GP.
Likelihood optimization for the hyperparameters of the GP kernel.
Use Expected Improvement as the acquisition function.
Use different optimizer for the acquisition optimization.
Initialize the GP with random samples.
Custom stopping criterion.

Also, we assume that all of our arm’s links are 1cm long.

Basic Layout¶

The file structure should look like this:

limbo
|-- exp
     |-- arm_example
          +-- wscript
          +-- main.cpp
|-- src
...

The basic layout of your main.cpp file should look like this:

#include <iostream>
#include <limbo/bayes_opt/boptimizer.hpp>
// Here we have to include other needed limbo headers

using namespace limbo;

struct Params {
  // Here go the parameters
};

template <typename Params>
struct eval_func {
  static constexpr int dim_in = sample_dimensions;
  static constexpr int dim_out = output_dimensions;
  // Here we define the evaluation function
};

int main(int argc, char** argv)
{
  // Defines, etc.
  bayes_opt::BOptimizer<Params, ...> opt;
  opt.optimize(eval_func<Params, ...>());
  auto val = opt.best_observation();
}

The wscript will have the following form:

from waflib.Configure import conf

def options(opt):
    pass

@conf
def configure(blah):
    pass

def build(bld):
    bld(features='cxx cxxprogram',
        source='main.cpp',
        includes='. ../../src',
        target='arm_example',
        uselib='BOOST EIGEN TBB LIBCMAES NLOPT',
        use='limbo')

Adding the forward kinematic model as prior¶

To compute the forward kinematics of our simple planar arm we use the following code:

Eigen::Vector2d forward_kinematics(const Eigen::VectorXd& x)
{
    Eigen::VectorXd rads = x * 2 * M_PI;

    Eigen::MatrixXd dh_mat(6, 4);

    dh_mat << rads(0), 0, 1, 0,
        rads(1), 0, 1, 0,
        rads(2), 0, 1, 0,
        rads(3), 0, 1, 0,
        rads(4), 0, 1, 0,
        rads(5), 0, 1, 0;

    Eigen::Matrix4d mat = Eigen::Matrix4d::Identity(4, 4);

    for (int i = 0; i < dh_mat.rows(); i++) {
        Eigen::VectorXd dh = dh_mat.row(i);

        Eigen::Matrix4d submat;
        submat << cos(dh(0)), -cos(dh(3)) * sin(dh(0)), sin(dh(3)) * sin(dh(0)), dh(2) * cos(dh(0)),
            sin(dh(0)), cos(dh(3)) * cos(dh(0)), -sin(dh(3)) * cos(dh(0)), dh(2) * sin(dh(0)),
            0, sin(dh(3)), cos(dh(3)), dh(1),
            0, 0, 0, 1;
        mat = mat * submat;
    }

    return (mat * Eigen::Vector4d(0, 0, 0, 1)).head(2);
}

To make this forward kinematic model useful to our GP, we need to create a mean function:

template <typename Params>
struct MeanFWModel {
    MeanFWModel(size_t dim_out = 1) {}

    template <typename GP>
    Eigen::VectorXd operator()(const Eigen::VectorXd& x, const GP&) const
    {
        Eigen::VectorXd pos = forward_kinematics(x);
        return pos;
    }
};

Using State-based bayesian optimization¶

See the explanation of the meaning of State-based optimization.

Creating an Aggregator:

template <typename Params>
struct DistanceToTarget {
    using result_type = double;
    DistanceToTarget(const Eigen::Vector2d& target) : _target(target) {}

    double operator()(const Eigen::VectorXd& x) const
    {
        return -(x - _target).norm();
    }

protected:
    Eigen::Vector2d _target;
};

Here, we are using a very simple aggregator that simply computes the distance between the end-effector and the target position.

Adding custom stop criterion¶

When our bayesian optimizer finds a solution that the end-effector of the arm is reasonably close to the target, we want it to stop. We can easily do this by creating our own stopping criterion:

template <typename Params>
struct MinTolerance {
    MinTolerance() {}

    template <typename BO, typename AggregatorFunction>
    bool operator()(const BO& bo, const AggregatorFunction& afun)
    {
        return afun(bo.best_observation(afun)) > Params::stop_mintolerance::tolerance();
    }
};

Creating the evaluation function¶

template <typename Params>
struct eval_func {
    static constexpr int dim_in = 6;
    static constexpr int dim_out = 2;

    eval_func() {}

    Eigen::VectorXd operator()(const Eigen::VectorXd& x) const
    {
        Eigen::VectorXd xx = x;
        // blocked joint
        xx(1) = 0;
        Eigen::VectorXd grip_pos = forward_kinematics(xx);
        return grip_pos;
    }
};

Creating the experiment¶

Creating the GP model¶

Kernel alias:

using kernel_t = kernel::SquaredExpARD<Params>;

Mean alias:

using mean_t = MeanFWModel<Params>;

Likelihood optimization alias:

using gp_opt_t = model::gp::KernelLFOpt<Params>;

GP alias:

using gp_t = model::GP<Params, kernel_t, mean_t, gp_opt_t>;

Acquisition, Initialization and other aliases¶

Acquisition aliases:

using acqui_t = acqui::EI<Params, gp_t>;
using acqui_opt_t = opt::Cmaes<Params>;

Initialization alias:

using init_t = init::RandomSampling<Params>;

Stopping criteria alias:

using stop_t = boost::fusion::vector<stop::MaxIterations<Params>, MinTolerance<Params>>;

Statistics alias:

using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>,
  stat::Samples<Params>, stat::Observations<Params>,
  stat::AggregatedObservations<Params>, stat::GPAcquisitions<Params>,
  stat::BestAggregatedObservations<Params>, stat::GPKernelHParams<Params>>;

Setting the parameter structure¶

struct Params {
    struct bayes_opt_boptimizer : public defaults::bayes_opt_boptimizer {
        BO_PARAM(double, noise, 1e-10);
        BO_PARAM(int, hp_period, 10);
    };
    struct bayes_opt_bobase : public defaults::bayes_opt_bobase {
        BO_PARAM(int, stats_enabled, true);
    };
    struct stop_maxiterations {
        BO_PARAM(int, iterations, 100);
    };
    struct stop_mintolerance {
        BO_PARAM(double, tolerance, -0.1);
    };
    struct acqui_ei {
        BO_PARAM(double, jitter, 0.0);
    };
    struct init_randomsampling {
        BO_PARAM(int, samples, 10);
    };
    struct kernel_squared_exp_ard : public defaults::kernel_squared_exp_ard {
    };
    struct opt_rprop : public defaults::opt_rprop {
    };
    struct opt_parallelrepeater : public defaults::opt_parallelrepeater {
    };
    struct opt_cmaes {
        BO_PARAM(int, restarts, 1);
        BO_PARAM(int, max_fun_evals, -1);
    };
};

Creating and running the Bayesian Optimizer¶

In your main function, you need to have something like the following:

int main()
{
    using kernel_t = kernel::SquaredExpARD<Params>;

    using mean_t = MeanFWModel<Params>;

    using gp_opt_t = model::gp::KernelLFOpt<Params>;

    using gp_t = model::GP<Params, kernel_t, mean_t, gp_opt_t>;

    using acqui_t = acqui::EI<Params, gp_t>;
    using acqui_opt_t = opt::Cmaes<Params>;

    using init_t = init::RandomSampling<Params>;

    using stop_t = boost::fusion::vector<stop::MaxIterations<Params>, MinTolerance<Params>>;

    using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>, stat::Samples<Params>, stat::Observations<Params>, stat::AggregatedObservations<Params>, stat::GPAcquisitions<Params>, stat::BestAggregatedObservations<Params>, stat::GPKernelHParams<Params>>;

    bayes_opt::BOptimizer<Params, modelfun<gp_t>, acquifun<acqui_t>, acquiopt<acqui_opt_t>, initfun<init_t>, statsfun<stat_t>, stopcrit<stop_t>> boptimizer;
    // Instantiate aggregator
    DistanceToTarget<Params> aggregator({1, 1});
    boptimizer.optimize(eval_func<Params>(), aggregator);
    std::cout << "New target!" << std::endl;
    aggregator = DistanceToTarget<Params>({1.5, 1});
    // Do not forget to pass `false` as the last parameter in `optimize`,
    // so you do not reset the data in boptimizer
    // i.e. keep all the previous data points in the Gaussian Process
    boptimizer.optimize(eval_func<Params>(), aggregator, false);
    return 1;
}

Running the experiment¶

Finally, from the root of limbo, run a build command, with the additional switch --exp arm_example:

./waf configure --exp arm_example
./waf build --exp arm_example

Then, an executable named arm_example should be produced under the folder build/exp/arm_example. When running the experiment, you should expect something like the following:

new point:  0.99374 0.999401 0.999716 0.532902 0.999337 0.648682 value: -0.73579 best:-0.73579
new point: 0.00924137    0.52356   0.992816   0.591639   0.900581 0.00022477 value: -1.59172 best:-0.73579
new point:  0.999114  0.303668 0.0132791  0.792124  0.391522  0.999149 value: -3.00304 best:-0.73579
new point: 0.990481 0.981046 0.379883 0.999432 0.282599 0.291695 value: -0.824245 best:-0.73579
new point: 0.00771956   0.949676   0.956241  0.0293142   0.244216  0.0891216 value: -2.68335 best:-0.73579
new point:   0.972554 0.00021536   0.998559      0.808   0.346161 0.00134114 value: -3.7039 best:-0.73579
new point: 0.00275069   0.495195 0.00167023   0.994612   0.631628   0.707545 value: -1.97244 best:-0.73579
new point:  0.124159  0.262741  0.303586  0.999707  0.335987 0.0192833 value: -1.28697 best:-0.73579
new point:  0.026011  0.307255  0.101375 0.0195426  0.562741 0.0400001 value: -2.31361 best:-0.73579
new point:    0.153818  0.00117556 9.92801e-05    0.376417     0.18015  0.00215051 value: -1.00521 best:-0.73579
new point:  0.107636  0.710152   0.41314 0.0703153  0.646439  0.606494 value: -0.685661 best:-0.685661
new point: 0.282632 0.794559 0.940368 0.530688 0.113832 0.439228 value: -1.44821 best:-0.685661
new point: 0.0110291  0.266178  0.576008  0.425873  0.120849  0.444479 value: -1.38147 best:-0.685661
new point:  0.109448 0.0548453  0.458707  0.487198  0.739701  0.588758 value: -0.147514 best:-0.147514
new point: 0.0877909 0.0481023  0.837642  0.438223  0.387531  0.649942 value: -0.150097 best:-0.147514
new point: 0.111047 0.633206 0.509962 0.443725 0.359951 0.243446 value: -0.780535 best:-0.147514
new point:  0.0827364   0.186448   0.333666   0.839036   0.536232 0.00476438 value: -0.575265 best:-0.147514
new point:   0.123347 0.00377535   0.554967   0.103699   0.371233   0.233517 value: -0.00982979 best:-0.00982979
New target!
new point:    0.998604    0.984174    0.999353 0.000200245    0.356422    0.999792 value: -1.38941 best:-0.370924
new point: 0.0189988  0.999921  0.942007  0.665937  0.999451  0.986427 value: -3.16331 best:-0.370924
new point: 0.0688255  0.230665 0.0273747  0.270297  0.980095  0.990872 value: -2.89225 best:-0.370924
new point: 0.997416 0.857708 0.998912 0.984197 0.391337 0.114332 value: -0.883913 best:-0.370924
new point:    0.999982 0.000168928    0.999398    0.336817    0.258304    0.929625 value: -1.19557 best:-0.370924
new point: 0.0700344  0.281842  0.919103 0.0183289  0.074567  0.970264 value: -4.17297 best:-0.370924
new point: 7.78708e-05 2.18076e-06    0.983852     0.99996    0.825274    0.612332 value: -2.57724 best:-0.370924
new point: 0.000101799    0.997473    0.797134    0.994634    0.377403 3.70205e-05 value: -2.33882 best:-0.370924
new point: 0.0467502  0.327915  0.235275  0.966877 0.0363554  0.909477 value: -3.49153 best:-0.370924
new point:    0.999615 0.000340133    0.637717    0.994796  0.00143888    0.464556 value: -2.75311 best:-0.370924
new point:   0.998958 0.00509838   0.474495   0.667517   0.532318   0.520064 value: -0.038073 best:-0.038073

Using state-based bayesian optimization, we can transfer what we learned during one task to achieve faster new tasks.