Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Dataset profile

The two available at OCHEM models predict Melting Point (MP) of organic chemical compounds. The MP is one of the important physico-chemical properties, which is frequently used in drug discovery to estimate aqueous solubility of chemical compounds. The complexity with prediction of this point are connected to purity of compounds, existence of polymorphic forms, degradation of compounds before melting, etc. All these factors influence the quality of models for this point. The data for MP were collected in OCHEM database as well as were provided by Dr. Luc Patiny from ChemExper database. The majority of data were organic chemistry compounds.

Data preprocessing

All chemical structures were processed using  OCHEM cleaning and standardization protocols. A specific care was used to eliminate salts and mixtures, and inorganic compound,  which could dramatically change MP of molecules. The detection and elimination of outliers was done based on p-value (article in preparation).

Descriptors

The first model (2D) was built using a combination of ALOGPS 2.1 model predictions and EState descriptors (electrotopological EState indices). The EState indices were calculated using a program developed by Dr. Tanchuk. The same descriptors were also used to develop ALOGPS 2.1 model.

The second model (3D) was built using DRAGON descriptors, which was provided by Prof. Todeschini and Talete Srl. For this model we generated 3D conformations of molecules using CORINA software, which is distributed by Molecular Networks GmbH.

Validation

The model was built using 5-fold cross validation. The dataset of 277 compounds compiled by [Bergstrom et al  Molecular descriptors influencing melting point and their role in classification of solid drugs. J. Chem. Inf. Comput. Sci. 2003; 43 (4) 1177-85]

Statistical parameters

Prediction accuracy

The basic prediction accuracy parameters according to the 5-fold cross-validation procedure (N=25547) are:

Property
RMSEMAER2r2 (Coefficient of determination)
2D model42.631.70.770.76
3D model4030.10.800.79

The basic prediction accuracy parameters for the Bergstrom test set (N=277):

Property
RMSEMAER2r2 (Coefficient of determination)
2D model43320.430.38
3D model41320.430.44

Applicability domain

The prediction accuracy is estimated using ASNN-STD. This distance to model  was shown to provide the best assessment of the accuracy of prediction as described in [Tetko et al, Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection, J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m].

  • No labels