...
Data preprocessing
All chemical structures were processed using OCHEM cleaning and standardization protocols.
Descriptors
The consensus model was calculate as a simple average of seven ASNN models developed with individual descriptors, namely ALOGPS + Estate, GSFRag, ISIDA fragments, Dragon, Adriana, CDK and ChemAxon. 3D conformations of molecules were generated using CORINA software, which is distributed by Molecular Networks GmbH.
Validation
The model was built using 5-fold cross validation. The dataset of 63 and 38 compounds compiled by [Boethling and Costanza, Domain of EPI suite biotransformation models, SAR QSAR Environ. Res. 2010, 21, 415-443] and [Steger-Hartmann, et al Incorporation of in silico biodegradability screening in early drug development—a feasible approach? Environ. Sci. Pollut. Res. Int. 2011, 18, 610-619, doi: 10.1007/s11356-010-0403-2 ] were used.
Statistical parameters
Prediction accuracy
The basic prediction accuracy parameters according to the 5-fold cross-validation procedure and prediction of test sets are:
Property | |||||
---|---|---|---|---|---|
# samples | Accuracy | BA | MCC | AUC | |
Training set | 1884 | 88 | 88 | 0.74 | 0.95 |
Boethling and Costanza | 63 | 86 | 71 | 0.4 | 0.86 |
Applicability domain
The prediction accuracy is estimated using PROB-STD distance to model as described in [Sushko et al, Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. J. Chem. Inf. Model. 2010, 50(12):2094-2111, doi: 10.1021/ci100253r]