Statistics / Writing results

Statistics are functors that are called at the end of each iteration of the Bayesian optimizer. Their job is to:

  • write the results to files;
  • write the current state of the optimization;
  • write the data that are useful for your own analyses.

All the statistics are written in a directory called hostname_date_pid-number. For instance: wallepro-perso.loria.fr_2016-05-13_16_16_09_72226

Limbo provides a few classes for common uses (see Statistics (stats) for details):

  • ConsoleSummary: writes a summary to std::cout at each iteration of the algorithm
  • AggregatedObservations: records the value of each evaluation of the function (after aggregation) [filename aggregated_observations.dat]
  • BestAggregatedObservations: records the best value observed so far after each iteration [filename aggregated_observations.dat]
  • Observations: records the value of each evaluation of the function (before aggregation) [filename: observations.dat]
  • Samples: records the position in the search space of the evaluated points [filename: samples.dat]
  • BestObservations: records the best observation after each iteration? [filename best_observations.dat]
  • BestSamples: records the position in the search space of the best observation after each iteration [filename: best_samples.dat]

These statistics are for “advanced users”:

  • GPAcquisitions
  • GPKernelHParams
  • GPLikelihood
  • GPMeanHParams

The default statistics list is:

boost::fusion::vector<stat::Samples<Params>, stat::AggregatedObservations<Params>,
  stat::ConsoleSummary<Params>>

Writing your own statistics class

Limbo only provides generic statistics classes. However, it is often useful to add user-defined statistics classes that are specific to a particular experiment.

All the statistics functors follow the same template:

template <typename Params>
struct Samples : public limbo::stat::StatBase<Params> {
    template <typename BO, typename AggregatorFunction>
    void operator()(const BO& bo, const AggregatorFunction&)
    {
      // code
    }
};

In a few words, they take a BO object (instance of the Bayesian optimizer) and do what they want.

For instance, we could add a statistics class that writes the worst observation at each iteration. Here is how to write this functor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
template <typename Params>
struct WorstObservation : public stat::StatBase<Params> {
    template <typename BO, typename AggregatorFunction>
    void operator()(const BO& bo, const AggregatorFunction& afun)
    {
        // [optional] if statistics have been disabled or if there are no observations, we do not do anything
        if (!bo.stats_enabled() || bo.observations().empty())
            return;

        // [optional] we create a file to write / you can use your own file but remember that this method is called at each iteration (you need to create it in the constructor)
        this->_create_log_file(bo, "worst_observations.dat");

        // [optional] we add a header to the file to make it easier to read later
        if (bo.total_iterations() == 0)
            (*this->_log_file) << "#iteration worst_observation sample" << std::endl;

        // ----- search for the worst observation ----
        // 1. get the aggregated observations
        auto rewards = std::vector<double>(bo.observations().size());
        std::transform(bo.observations().begin(), bo.observations().end(), rewards.begin(), afun);
        // 2. search for the worst element
        auto min_e = std::min_element(rewards.begin(), rewards.end());
        auto min_obs = bo.observations()[std::distance(rewards.begin(), min_e)];
        auto min_sample = bo.samples()[std::distance(rewards.begin(), min_e)];

        // ----- write what we have found ------
        // the file is (*this->_log_file)
        (*this->_log_file) << bo.total_iterations() << " " << min_obs.transpose() << " " << min_sample.transpose() << std::endl;
    }
};

In order to configure the Bayesian optimizer to use our new statistics class, we first need to define a new statistics list which includes our new WorstObservation:

1
2
3
4
5
    // define a special list of statistics which include our new statistics class
    using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>,
        stat::Samples<Params>,
        stat::Observations<Params>,
        WorstObservation<Params>>;

Then, we use it to define the optimizer:

1
    bayes_opt::BOptimizer<Params, statsfun<stat_t>> boptimizer;

The full source code is available in src/tutorials/statistics.cpp and reproduced here:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#include <iostream>
#include <limbo/limbo.hpp>

using namespace limbo;

struct Params {
    struct bayes_opt_boptimizer : public defaults::bayes_opt_boptimizer {
    };

// depending on which internal optimizer we use, we need to import different parameters
#ifdef USE_NLOPT
    struct opt_nloptnograd : public defaults::opt_nloptnograd {
    };
#elif defined(USE_LIBCMAES)
    struct opt_cmaes : public defaults::opt_cmaes {
    };
#else
    struct opt_gridsearch : public defaults::opt_gridsearch {
    };
#endif

    // enable / disable the writing of the result files
    struct bayes_opt_bobase : public defaults::bayes_opt_bobase {
        BO_PARAM(int, stats_enabled, true);
    };

    // no noise
    struct kernel : public defaults::kernel {
        BO_PARAM(double, noise, 1e-10);
    };

    struct kernel_maternfivehalves : public defaults::kernel_maternfivehalves {
    };

    // we use 10 random samples to initialize the algorithm
    struct init_randomsampling {
        BO_PARAM(int, samples, 10);
    };

    // we stop after 40 iterations
    struct stop_maxiterations {
        BO_PARAM(int, iterations, 40);
    };

    // we use the default parameters for acqui_ucb
    struct acqui_ucb : public defaults::acqui_ucb {
    };
};

struct Eval {
    // number of input dimension (x.size())
    BO_PARAM(size_t, dim_in, 1);
    // number of dimenions of the result (res.size())
    BO_PARAM(size_t, dim_out, 1);

    // the function to be optimized
    Eigen::VectorXd operator()(const Eigen::VectorXd& x) const
    {
        double y = -((5 * x(0) - 2.5) * (5 * x(0) - 2.5)) + 5;
        // we return a 1-dimensional vector
        return tools::make_vector(y);
    }
};

template <typename Params>
struct WorstObservation : public stat::StatBase<Params> {
    template <typename BO, typename AggregatorFunction>
    void operator()(const BO& bo, const AggregatorFunction& afun)
    {
        // [optional] if statistics have been disabled or if there are no observations, we do not do anything
        if (!bo.stats_enabled() || bo.observations().empty())
            return;

        // [optional] we create a file to write / you can use your own file but remember that this method is called at each iteration (you need to create it in the constructor)
        this->_create_log_file(bo, "worst_observations.dat");

        // [optional] we add a header to the file to make it easier to read later
        if (bo.total_iterations() == 0)
            (*this->_log_file) << "#iteration worst_observation sample" << std::endl;

        // ----- search for the worst observation ----
        // 1. get the aggregated observations
        auto rewards = std::vector<double>(bo.observations().size());
        std::transform(bo.observations().begin(), bo.observations().end(), rewards.begin(), afun);
        // 2. search for the worst element
        auto min_e = std::min_element(rewards.begin(), rewards.end());
        auto min_obs = bo.observations()[std::distance(rewards.begin(), min_e)];
        auto min_sample = bo.samples()[std::distance(rewards.begin(), min_e)];

        // ----- write what we have found ------
        // the file is (*this->_log_file)
        (*this->_log_file) << bo.total_iterations() << " " << min_obs.transpose() << " " << min_sample.transpose() << std::endl;
    }
};

int main()
{
    // we use the default acquisition function / model / stat / etc.

    // define a special list of statistics which include our new statistics class
    using stat_t = boost::fusion::vector<stat::ConsoleSummary<Params>,
        stat::Samples<Params>,
        stat::Observations<Params>,
        WorstObservation<Params>>;

    /// remmeber to use the new statistics vector via statsfun<>!
    bayes_opt::BOptimizer<Params, statsfun<stat_t>> boptimizer;

    // run the evaluation
    boptimizer.optimize(Eval());

    // the best sample found
    std::cout << "Best sample: " << boptimizer.best_sample()(0) << " - Best observation: " << boptimizer.best_observation()(0) << std::endl;
    return 0;
}