Skip to end of metadata
Go to start of metadata

Bagging (Bootstrap aggregating) is a meta-learning technique that involves creation of an ensemble of models based on random training sets and created from the original training set by sampling with replacement.

The final model is a simple average of the individual models within the ensemble.

In other words, bagging involves:

  • replicating multiple training sets of equal sizes from the original training set. The size each set equals to the size of the original set
  • definition of the respective validation sets as the samples not included in the training sets. By chance, about 37% of the samples will not included in a training set.
  • training multiple models using a particular machine learning method (the same method and parameters for each model)
  • for getting a validated prediction for a sample, calculate the prediction value as the average prediction by those models that had this compound in their validation sets

Bagging achieves two important goals: validation and assessment of predictive uncertainty, that is:

  • Obtaining a correctly validated predictive statistics (similarly to cross-validation)
  • Obtaining standard deviations for each prediction, which is possible because of using an ensemble of models rather that a single model.
    The standard deviation (referred to as BAGGING-STD) can be used, for calculation of prediction uncertainty and definition of applicability domain.
  • No labels