QNPR descriptors
These descriptors are used for QNPR (Quantitative Name Property Relationship) thus giving their name. The descriptors are derived directly from the compounds name or SMILES strings.
For each molecule either canonical SMILES or IUPAC name are split into fragments of a specified length, which is determined by the configuration. All numbers 0-9 are substituted with § symbol.
Thus we will get for
- CCC: C CC and CCC as descriptors
- c1ccccc1: c § c§ c§c §cc ccc and cc$ as descriptors
when using fragments of length 1-3
Parameters
Parameter | Effect | |
Fragments from to | Create string fragments with length from x to y | |
Minimum fragment count threshold | If there are not at least # occurences of the pattern in the whole dataset, filter the descriptor out | |
Type of fragments | Naming scheme (SMILES, IUPAC, ...) |
Literature
(1) Thormann M, Vidal D, Almstetter M, Pons M; Nomen Est Omen: Quantitative Prediction of Molecular Properties Directly from IUPAC Names; The Open Applied Informatics Journal; 1 (1), 28-32, 2007.