Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The initial biodegradability dataset was collected from three main sources:  internal CADASTER dataset comprising measurements extracted from CHRIP (Chemical Risk Information Platform),  measurements assembled by [Cheng at al In silico assessment of chemical biodegradability. J. Chem. Inf. Model. 2012, 52, 655-669]and a dataset with measurements of fragrances collected by Prof. Gramatica group. These data were already classified as "readily biodegradable/ non readily biodegradable" compounds and comprised 1884 compounds, including 37% readily biodegradable ones.
1The description is updated from according to Vorberg and Tetko, Modeling the biodegradability of chemical compounds using the Online CHEmical Modeling environment (OCHEM), Mol. Inform. 2014, 33(1), 73–85 (Open Access), doi: 10.1002/minf.201300030.

Data preprocessing

All chemical structures were processed using  OCHEM cleaning and standardization protocols.

Descriptors

The consensus model was calculate as a simple average of seven ASNN models developed with individual descriptors, namely ALOGPS + Estate, GSFRag, ISIDA fragments, Dragon, Adriana, CDK and ChemAxon. 3D conformations of molecules were generated using CORINA software, which is distributed by Molecular Networks GmbH.

Validation

The model was built using 5-fold cross validation. The dataset of 63 and 38 compounds compiled by [Boethling and Costanza, Domain of EPI suite biotransformation models, SAR QSAR Environ. Res. 2010, 21, 415-443] and [Steger-Hartmann, et al  Incorporation of in silico biodegradability screening in early drug development—a feasible approach? Environ. Sci. Pollut. Res. Int. 2011, 18, 610-619, doi: 10.1007/s11356-010-0403-2 ] were used.

Statistical parameters

Prediction accuracy

The basic prediction accuracy parameters according to the 5-fold cross-validation procedure and prediction of test sets are:

Property
# samplesAccuracyBAMCCAUC
Training set188488880.740.95
Boethling and Costanza6386710.40.86

Applicability domain

The prediction accuracy is estimated using PROB-STD distance to model as described in [Sushko et al, Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. J. Chem. Inf. Model. 2010, 50(12):2094-2111, doi: 10.1021/ci100253r]