Advanced Example

Problem Statement

6-DOF planar arm

Let’s say we have a planar 6-DOF arm manipulator and we want its end-effector to reach a target position (i.e. catch an object). Now, imagine that a joint is broken (or not working properly). We want to use Bayesian Optimization to learn a compensatory behavior that will allow our arm to reach the target in spite of damage. Moreover, once we have learned how to reach one target, we want to reach another (different) one. We will use state-based bayesian optimization to transfer knowledge between the two tasks. We will use the following:

  • The forward kinematic model as prior knowledge (mean of Gaussian Process).
  • The Squared exponential covariance function with automatic relevance detection as the kernel function of the GP.
  • Likelihood optimization for the hyperparameters of the GP kernel.
  • Use Expected Improvement as the acquisition function.
  • Use different optimizer for the acquisition optimization.
  • Initialize the GP with random samples.
  • Custom stopping criterion.

Also, we assume that all of our arm’s links are 1cm long.

Basic Layout

The file structure should look like this:

limbo
|-- exp
     |-- arm_example
          +-- wscript
          +-- main.cpp
|-- src
...

The basic layout of your main.cpp file should look like this:

#include <iostream>
#include <limbo/bayes_opt/boptimizer.hpp>
// Here we have to include other needed limbo headers

using namespace limbo;

struct Params {
  // Here go the parameters
};

template <typename Params>
struct eval_func {
  static constexpr int dim_in = sample_dimensions;
  static constexpr int dim_out = output_dimensions;
  // Here we define the evaluation function
};

int main(int argc, char** argv)
{
  // Defines, etc.
  bayes_opt::BOptimizer<Params, ...> opt;
  opt.optimize(eval_func<Params, ...>());
  auto val = opt.best_observation();
}

The wscript will have the following form:

from waflib.Configure import conf

def options(opt):
    pass

@conf
def configure(blah):
    pass

def build(bld):
    bld(features='cxx cxxprogram',
        source='main.cpp',
        includes='. ../../src',
        target='arm_example',
        uselib='BOOST EIGEN TBB LIBCMAES NLOPT',
        use='limbo')

Adding the forward kinematic model as prior

To compute the forward kinematics of our simple planar arm we use the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Eigen::Vector2d forward_kinematics(const Eigen::VectorXd& x)
{
    Eigen::VectorXd rads = x * 2 * M_PI;

    Eigen::MatrixXd dh_mat(6, 4);

    dh_mat << rads(0), 0, 1, 0,
        rads(1), 0, 1, 0,
        rads(2), 0, 1, 0,
        rads(3), 0, 1, 0,
        rads(4), 0, 1, 0,
        rads(5), 0, 1, 0;

    Eigen::Matrix4d mat = Eigen::Matrix4d::Identity(4, 4);

    for (int i = 0; i < dh_mat.rows(); i++) {
        Eigen::VectorXd dh = dh_mat.row(i);

        Eigen::Matrix4d submat;
        submat << cos(dh(0)), -cos(dh(3)) * sin(dh(0)), sin(dh(3)) * sin(dh(0)), dh(2) * cos(dh(0)),
            sin(dh(0)), cos(dh(3)) * cos(dh(0)), -sin(dh(3)) * cos(dh(0)), dh(2) * sin(dh(0)),
            0, sin(dh(3)), cos(dh(3)), dh(1),
            0, 0, 0, 1;
        mat = mat * submat;
    }

    return (mat * Eigen::Vector4d(0, 0, 0, 1)).head(2);
}

To make this forward kinematic model useful to our GP, we need to create a mean function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
template <typename Params>
struct MeanFWModel {
    MeanFWModel(size_t dim_out = 1) {}

    template <typename GP>
    Eigen::VectorXd operator()(const Eigen::VectorXd& x, const GP&) const
    {
        Eigen::VectorXd pos = forward_kinematics(x);
        return pos;
    }
};

Using State-based bayesian optimization

See the explanation of the meaning of State-based optimization.

Creating an Aggregator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
template <typename Params>
struct DistanceToTarget {
    using result_type = double;
    DistanceToTarget(const Eigen::Vector2d& target) : _target(target) {}

    double operator()(const Eigen::VectorXd& x) const
    {
        return -(x - _target).norm();
    }

protected:
    Eigen::Vector2d _target;
};

Here, we are using a very simple aggregator that simply computes the distance between the end-effector and the target position.

Adding custom stop criterion

When our bayesian optimizer finds a solution that the end-effector of the arm is reasonably close to the target, we want it to stop. We can easily do this by creating our own stopping criterion:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
template <typename Params>
struct MinTolerance {
    MinTolerance() {}

    template <typename BO, typename AggregatorFunction>
    bool operator()(const BO& bo, const AggregatorFunction& afun)
    {
        return afun(bo.best_observation(afun)) > Params::stop_mintolerance::tolerance();
    }
};

Creating the evaluation function

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
template <typename Params>
struct eval_func {
    static constexpr int dim_in = 6;
    static constexpr int dim_out = 2;

    eval_func() {}

    Eigen::VectorXd operator()(const Eigen::VectorXd& x) const
    {
        Eigen::VectorXd xx = x;
        // blocked joint
        xx(1) = 0;
        Eigen::VectorXd grip_pos = forward_kinematics(xx);
        return grip_pos;
    }
};

Creating the experiment

Creating the GP model

Kernel alias:

using kernel_t = kernel::SquaredExpARD<Params>;

Mean alias:

using mean_t = MeanFWModel<Params>;

Likelihood optimization alias:

using gp_opt_t = model::gp::KernelLFOpt<Params>;

GP alias:

using gp_t = model::GP<Params, kernel_t, mean_t, gp_opt_t>;

Acquisition, Initialization and other aliases

Acquisition aliases:

using acqui_t = acqui::EI<Params, gp_t>;
using acqui_opt_t = opt::Cmaes<Params>;

Initialization alias:

using init_t = init::RandomSampling<Params>;

Stopping criteria alias:

using stop_t = boost::fusion::vector<stop::MaxIterations<Params>, MinTolerance<Params>>;

Statistics alias:

using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>,
  stat::Samples<Params>, stat::Observations<Params>,
  stat::AggregatedObservations<Params>, stat::GPAcquisitions<Params>,
  stat::BestAggregatedObservations<Params>, stat::GPKernelHParams<Params>>;

Setting the parameter structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
struct Params {
    struct bayes_opt_boptimizer : public defaults::bayes_opt_boptimizer {
        BO_PARAM(double, noise, 1e-10);
        BO_PARAM(int, hp_period, 10);
    };
    struct bayes_opt_bobase : public defaults::bayes_opt_bobase {
        BO_PARAM(int, stats_enabled, true);
    };
    struct stop_maxiterations {
        BO_PARAM(int, iterations, 100);
    };
    struct stop_mintolerance {
        BO_PARAM(double, tolerance, -0.1);
    };
    struct acqui_ei {
        BO_PARAM(double, jitter, 0.0);
    };
    struct init_randomsampling {
        BO_PARAM(int, samples, 10);
    };
    struct kernel_squared_exp_ard : public defaults::kernel_squared_exp_ard {
    };
    struct opt_rprop : public defaults::opt_rprop {
    };
    struct opt_parallelrepeater : public defaults::opt_parallelrepeater {
    };
    struct opt_cmaes {
        BO_PARAM(int, restarts, 1);
        BO_PARAM(int, max_fun_evals, -1);
    };
};

Creating and running the Bayesian Optimizer

In your main function, you need to have something like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int main()
{
    using kernel_t = kernel::SquaredExpARD<Params>;

    using mean_t = MeanFWModel<Params>;

    using gp_opt_t = model::gp::KernelLFOpt<Params>;

    using gp_t = model::GP<Params, kernel_t, mean_t, gp_opt_t>;

    using acqui_t = acqui::EI<Params, gp_t>;
    using acqui_opt_t = opt::Cmaes<Params>;

    using init_t = init::RandomSampling<Params>;

    using stop_t = boost::fusion::vector<stop::MaxIterations<Params>, MinTolerance<Params>>;

    using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>, stat::Samples<Params>, stat::Observations<Params>, stat::AggregatedObservations<Params>, stat::GPAcquisitions<Params>, stat::BestAggregatedObservations<Params>, stat::GPKernelHParams<Params>>;

    bayes_opt::BOptimizer<Params, modelfun<gp_t>, acquifun<acqui_t>, acquiopt<acqui_opt_t>, initfun<init_t>, statsfun<stat_t>, stopcrit<stop_t>> boptimizer;
    // Instantiate aggregator
    DistanceToTarget<Params> aggregator({1, 1});
    boptimizer.optimize(eval_func<Params>(), aggregator);
    std::cout << "New target!" << std::endl;
    aggregator = DistanceToTarget<Params>({1.5, 1});
    // Do not forget to pass `false` as the last parameter in `optimize`,
    // so you do not reset the data in boptimizer
    // i.e. keep all the previous data points in the Gaussian Process
    boptimizer.optimize(eval_func<Params>(), aggregator, false);
    return 1;
}

Running the experiment

Finally, from the root of limbo, run a build command, with the additional switch --exp arm_example:

./waf configure --exp arm_example
./waf build --exp arm_example

Then, an executable named arm_example should be produced under the folder build/exp/arm_example. When running the experiment, you should expect something like the following:

0 new point:  0.99374 0.999401 0.999716 0.532902 0.999337 0.648682 value: -0.73579 best:-0.73579
1 new point: 0.00924137    0.52356   0.992816   0.591639   0.900581 0.00022477 value: -1.59172 best:-0.73579
2 new point:  0.999114  0.303668 0.0132791  0.792124  0.391522  0.999149 value: -3.00304 best:-0.73579
3 new point: 0.990481 0.981046 0.379883 0.999432 0.282599 0.291695 value: -0.824245 best:-0.73579
4 new point: 0.00771956   0.949676   0.956241  0.0293142   0.244216  0.0891216 value: -2.68335 best:-0.73579
5 new point:   0.972554 0.00021536   0.998559      0.808   0.346161 0.00134114 value: -3.7039 best:-0.73579
6 new point: 0.00275069   0.495195 0.00167023   0.994612   0.631628   0.707545 value: -1.97244 best:-0.73579
7 new point:  0.124159  0.262741  0.303586  0.999707  0.335987 0.0192833 value: -1.28697 best:-0.73579
8 new point:  0.026011  0.307255  0.101375 0.0195426  0.562741 0.0400001 value: -2.31361 best:-0.73579
9 new point:    0.153818  0.00117556 9.92801e-05    0.376417     0.18015  0.00215051 value: -1.00521 best:-0.73579
10 new point:  0.107636  0.710152   0.41314 0.0703153  0.646439  0.606494 value: -0.685661 best:-0.685661
11 new point: 0.282632 0.794559 0.940368 0.530688 0.113832 0.439228 value: -1.44821 best:-0.685661
12 new point: 0.0110291  0.266178  0.576008  0.425873  0.120849  0.444479 value: -1.38147 best:-0.685661
13 new point:  0.109448 0.0548453  0.458707  0.487198  0.739701  0.588758 value: -0.147514 best:-0.147514
14 new point: 0.0877909 0.0481023  0.837642  0.438223  0.387531  0.649942 value: -0.150097 best:-0.147514
15 new point: 0.111047 0.633206 0.509962 0.443725 0.359951 0.243446 value: -0.780535 best:-0.147514
16 new point:  0.0827364   0.186448   0.333666   0.839036   0.536232 0.00476438 value: -0.575265 best:-0.147514
17 new point:   0.123347 0.00377535   0.554967   0.103699   0.371233   0.233517 value: -0.00982979 best:-0.00982979
New target!
18 new point:    0.998604    0.984174    0.999353 0.000200245    0.356422    0.999792 value: -1.38941 best:-0.370924
19 new point: 0.0189988  0.999921  0.942007  0.665937  0.999451  0.986427 value: -3.16331 best:-0.370924
20 new point: 0.0688255  0.230665 0.0273747  0.270297  0.980095  0.990872 value: -2.89225 best:-0.370924
21 new point: 0.997416 0.857708 0.998912 0.984197 0.391337 0.114332 value: -0.883913 best:-0.370924
22 new point:    0.999982 0.000168928    0.999398    0.336817    0.258304    0.929625 value: -1.19557 best:-0.370924
23 new point: 0.0700344  0.281842  0.919103 0.0183289  0.074567  0.970264 value: -4.17297 best:-0.370924
24 new point: 7.78708e-05 2.18076e-06    0.983852     0.99996    0.825274    0.612332 value: -2.57724 best:-0.370924
25 new point: 0.000101799    0.997473    0.797134    0.994634    0.377403 3.70205e-05 value: -2.33882 best:-0.370924
26 new point: 0.0467502  0.327915  0.235275  0.966877 0.0363554  0.909477 value: -3.49153 best:-0.370924
27 new point:    0.999615 0.000340133    0.637717    0.994796  0.00143888    0.464556 value: -2.75311 best:-0.370924
28 new point:   0.998958 0.00509838   0.474495   0.667517   0.532318   0.520064 value: -0.038073 best:-0.038073

Using state-based bayesian optimization, we can transfer what we learned during one task to achieve faster new tasks.