Message-ID: <650129504.111.1632427487585.JavaMail.bigchem@cpu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_110_1231915649.1632427487584" ------=_Part_110_1231915649.1632427487584 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The model predict octanol/water partition coefficient (logP) and solubil= ity in water (logS). Both these parameters are important important for drug= discovery. The model is further development of ALOGPS 2.1 program [Tetko, = I. V.; Tanchuk, V. Y. Application of associative n= eural networks for prediction of lipophilicity in ALOGPS 2.1 program, <= em>J. Chem. Inf. Comput. Sci., 2002, 42, 1136-45 and<= em> Tetko, et al Estimation of aqueous solubility = of chemical compounds using E-state indices, J. Chem. Inf. Comput. = Sci., 2001, 41, 1488-93] which is available at <= a href=3D"http://www.vcclab.org" class=3D"external-link" rel=3D"nofollow">V= irtual Computational Laboratory (VCCLAB) site. This program was assesse= d in several benchmarking studies and was top-ranked for prediction of = in house Pfizer and Nycomed [Mannhold et al, Calculation of molecular = lipophilicity: State-of-the-art and comparison of log P methods on more tha= n 96,000 compounds. J Pharm Sci. 2009 Mar;98(3):861-93. doi: 10.1002/jps.21494.].
The data for logP and logS were taken from these two previous publicatio= ns as well as were merged with those collected at OCHEM web site. The trainin= g sets included 16647 and 6778 unique compounds for logP and logS propertie= s, respectively. The data were filtered from the outliers using an automati= c p-value based filtering feature of OCHEM (article in preparation). Consid= ering high inter-dependency of both properties, there were modeled simultan= eously, using multi-learning feature of OCHEM [Varnek et al, Inductive transfer of knowledge: application of multi-task l= earning and feature net approaches to model tissue-air partition coefficien= ts. J Chem Inf Model. 2009 Jan;49(1):133-44. doi: 10.1021/ci8002914] to increase the applicability domai= n of the models.
All chemical structures were processed using OCHEM cleaning and st= andardization protocols.
This model was built using EState descriptors (electrotopological EState= indices) using program developed by Dr. Tanchuk, which was also used to de= velop ALOGPS 2.1 model.
The model was built using 5-fold cross validation.
The basic prediction accuracy parameters according to the 5-fold cross-v= alidation procedure are:
Property | |||||
---|---|---|---|---|---|
# records | RMSE | MAE | R2 | r2 (Coefficient of determi= nation) | |
![]() |
16912 | 0.42 | 0.30 | 0.95 | 0.95 |
![]() |
8102 | 0.70 | 0.52 | 0.90 | 0.90 |
The prediction accuracy is estimated using ASNN-STD. This distance to mo= del was shown to provide the best assessment of the accuracy of predi= ction as described in [Tetko et al, Critical assessme= nt of QSAR models of environmental toxicity against Tetrahymena pyriformis:= focusing on applicability domain and overfitting by variable selection, J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m].