Ready Biodegradability (consensus model)

Dat= aset profile

Biodegradability ¹ describes the capacity of substances to be mineralized by free= -living bacteria. It is a crucial property in estimating a compound=E2=80= =99s long-term impact on the environment. Chemicals that do not quickly deg= rade have the potential to release their toxic effects over a long period; = they can therefore pose a greater risk than chemicals with higher acute tox= icity, but which are not stable. In order to better characterize readily bi= odegradable chemicals, the Organization for Economic Cooperation and Develo= pment (OECD) made efforts to develop standardized methods. In 1992,=20 test guideline 301 was published, describing six m= ethods of screening chemicals for ready biodegradability under aerobic cond= itions. The ability to reliably predict biodegradability reduces the need f= or laborious experimental testing.=20 The various test methods share a number of features: the test substance is= incubated in a mineral medium (potassium, sodium phosphate, etc.) and an i= noculum (activated sludge, surface soils, etc.) under aerobic conditions in= dark or diffuse light. A reference compound (aniline, sodium acetate, or s= odium benzoate) is run in parallel as a control. The degradation is then de= termined by measuring properties such as DOC (dissolved organic carbon), CO= ₂ production, and O₂ uptake. The test should run for = a period of 28 days.=20 The pass levels for readily biodegradability must be reached during a 10-d= ay window within the 28-day test period. Depending on the test method emplo= yed, these are:

70% DOC: percentage of dissolved organic carbon removed <= /span>
60% ThOD: percentage of the theoretical oxygen demand
60% ThCO₂: percentage of the theoretical carbon dioxide yie= ld

The initial biodegradability dataset was collected from three main sources:= internal=20 CADASTER dataset comprising measurements extracted from CHRIP ( Chemical Risk Information Platform), measure= ments assembled by [Cheng at al In silico assessment of chemical biode= gradability.=20 J. Chem. Inf. Model. 2012= , 52, 655-669] and a dataset with measurements of fragrances c= ollected by=20 Prof. Gramatica group. These data w= ere already classified as "readily biodegradable/ non readily biodegra= dable" compounds and comprised 1884 compounds, including 37% readily b= iodegradable ones.

¹The description is according to Vorberg and Tetko, Modeling the= biodegradability of chemical compounds using the Online CHEmical Mode= ling environment (OCHEM),=20 Mol. Inform. 2014, 33(1), 73=E2=80=9385 (Open Access), doi: 10.1002/minf.20130003= 0.=20

= Data preprocessing

All chemical structures were processed using OCHEM cleaning and st= andardization protocols.

Descri= ptors

The consensus model was calculate as a simple average of seven ASNN mode= ls developed with individual descriptors= , namely ALOGPS + Estate, GSFRag, ISIDA frag= ments, Dragon, Adrian= a, CDK and ChemA= xon. 3D conformations of molecules were generated using CORINA software, which is distributed by Mol= ecular Networks GmbH.

Validat= ion

The model was built using 5-fold cross validation. The dataset of 63 and= 38 compounds compiled by [Boethling and Costanza, Domain of EPI suite biotransformation= models, SAR QSAR Environ. Res. 2010, 21<= /em>, 415-443] and [Steger-Hartmann, et al Incorporation of in= silico biodegradability screening in early drug development=E2=80=94a feas= ible approach? Environ. Sci. Pollut. Res. Int. 2011, 18, 610-619, doi: 10.1007/s11356-010-040= 3-2 ] were used.

Statistical parameters

Predict= ion accuracy

The basic prediction accuracy parameters according to the 5-fold cross-v= alidation procedure and prediction of test sets are:

Property

# samples Accuracy BA MCC AUC

Training set 1884 88 88 0.74 0.95

Boethli= ng and Costanza 63 86 71 0.4 0.86

Applicability domain

The prediction accuracy is estimated using PROB-STD distance to model as= described in [Sushko et al, Applicability domains for classification probl= ems: Benchmarking of distance to models for Ames mutagenicity set. J. C= hem. Inf. Model. 2010, 50(12):2094-2111, doi: 10.1021/ci100253r]

Property
# samples	Accuracy	BA	MCC	AUC
Training set	1884	88	88	0.74	0.95
Boethli= ng and Costanza	63	86	71	0.4	0.86