Limbo-specific concepts¶
Limbo extends the traditionnal Bayesian optimization algorithm with a few ideas developped in our group.
Mean function¶
In classic Bayesian optimization, the Gaussian process is initialized with a constant mean because it is assumed that all the points of the search space are equally likely to be good. The model is then progressively refined after each observation. This constant mean is one of the main prior used to build the Gaussian process: setting its value is often critical for the performance of the algorithm (see [2]).
Nevertheless, it can be useful to use more complex priors. This is, for instance, the case when we can use a low-fidelity simulator as a prior for physical experiments with a robot [1]. To incorporate this idea into the Bayesian optimization, limbo models the difference between the prediction of the behavior-performance map and the actual performance on the real robot, instead of directly modeling the objective function. This idea is incorporated into the Gaussian process by modifying the update equation for the mean function (\(\mu_t(\mathbf{x})\)):
where \(\mathcal{P}(\mathbf{x})\) is the performance of \(\mathbf{x}\) according to the mean function (the prior) and \(\mathcal{P}(\mathbf{\chi}_{1:t})\) is the performance of all the previous observations, also according to the mean function (prior).
Replacing \(\mathbf{P}_{1:t}\) by \(\mathbf{P}_{1:t}-\mathcal{P}(\mathbf{\chi}_{1:t})\) means that the Gaussian process models the difference between the actual performance \(\mathbf{P}_{1:t}\) and the performance from the behavior-performance map \(\mathcal{P}(\mathbf{\chi}_{1:t})\). The term \(\mathcal{P}(\mathbf{x})\) is the prediction given by the mean function (the behavior-performance map in [1]).
See the Limbo implementation guide for the available mean functions.
State-based optimization¶
In many applications, the tasks can be expressed according to the robot’s state. For example, reaching a target with a robotics arm means to place the robot’s end effector at a particular location and walking forward can be expressed as moving the center of mass of the robot. For robotics manipulation, the state of the robot can be extended with the state of the manipulated object. In the same way, all the observations can be expressed as a part of the robot’s state (the observable part).
Instead of modeling the performance function, it is sometimes more effective to use n Gaussian processes to model the state, and then combine these values into a single one for the acquisition function, using an aggregator.
Limbo implements this concept.
[1] | (1, 2) Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. Robots that can adapt like animals. Nature, 521(7553):503–507, May 2015. URL: http://www.nature.com/doifinder/10.1038/nature14422, doi:10.1038/nature14422. |
[2] | Daniel J Lizotte, Tao Wang, Michael H Bowling, and Dale Schuurmans. Automatic gait optimization with gaussian process regression. In Proceedings of the the International Joint Conference on Artificial Intelligence (IJCAI), volume 7, 944–949. 2007. |