Message-ID: <430996471.259.1632427552239.JavaMail.bigchem@cpu> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_258_940684427.1632427552239" ------=_Part_258_940684427.1632427552239 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html MolPrint

MolPrint

MolPrint (aka MolPrint 2D) descriptors[1,2] are a particular typ= e of circular fingerprint which employ Sybyl MOL2 atom types. More specific= ally, they are based on counts of MOL2 atom types around each heavy atom of= the molecule. In contrast to structural keys they do not draw features fro= m a limited set of structural fragments (such as MACCS keys). Rather, they = enumerate all atom environments present in a molecule. MolPrint 2D descript= ors are similar to SciTegic's (Pipeline Pilot) extended-connectivity finger= prints (ECFP), but MolPrint 2D features are not hashed.[5] The implementati= on of MolPrint 2D used in OCHEM uses the atom types literally as they appea= r in a MOL2 file, i.e., an aromatic carbon is encoded as "C.ar", = a sp2-hybridized oxygen atom as "O.2", etc.

For each heavy atom all neighboring atoms at a given number of bonds awa= y are tallied and encoded as a string. Such a string always starts with the= heavy atom C at the center of the feature, followed by triples of the form= D-T-N, where D is the distance in bonds from the central atom (D in {1, 2,= =E2=80=A6}), T the type of atom (T is a valid Sybyl MOL2 atom type), and N= the number of atoms of type T that can be found at a distance D of the cen= tral atom C. The central atom and all tripled are separated by semicolons. = Overall, that results in feature strings of the form: C;D-T-N;D-T-N;D-T-N;= =E2=80=A6 In practice, it was found that values for D up to 3 should be con= sidered for descriptor generation, with D=3D2 the most commonly employed. T= he higher the value for D, the more specific the features become by nature = of their construction.

A feature that would be generated for the atom marked in bold in this fi= gure (the central atom for this feature)

would be described as follows:

Central atom: C.ar
Distance of one bond from C: two times C.ar =3D&= gt; 1-C.ar-2; one timee C.co2 =3D> 1-C.co2-1;
Distance of two bonds= from C: two times O.co2 =3D> 2-O.co2-2; two times C.ar =3D> 2-C.ar-2= ;

The final feature for the above example would be the concatenation of th= e central atom and all the triples: C.ar;1-C.ar-2;1-C.co2-1;2-C.ar-2;2-O.co= 2-2;

For each distance D, the triples are ordered alphabetically, so 1-C.ar-2= would come before 1-F-2 but after 1-Br-1. In the example above, 2-C.ar-2 c= omes before 2-O.co2-2.

This procedure is repeated for every heavy atom in the molecule.

The binary nature of descriptors renders MolPrint descriptors more amena= ble to certain types of modeling methods (such as Bayes or k-NN methods), m= ore than for example neural network models. The models generated are relati= vely easy to interpret, since every feature corresponds to roughly a functi= onal group (though without explicit information about the bond order betwee= n atoms).

MolPrint descriptors have been used successfully in virtual screening[3]= and ligand-target prediction[4] where they have been shown to capture a la= rge amound of the information relating molecular structure to bioactivity a= gainst a protein target.

References

[1] A. Bender, H.Y. Mussa, R.C. Glen and S. Reiling. Molecular similarit= y searching using atom environments, information-based feature selection, a= nd a naive bayesian classifier. Journal of Chemical Information and Compute= r Sciences, 2004, 44, 170-178. - http://dx.doi.org/10.10= 21/ci034207y

[2] A. Bender, H.Y. Mussa, R.C. Glen and S. Reiling. Similarity searchin= g of chemical databases using atom environment descriptors: evaluation of p= erformance. Journal of Chemical Information and Computer Sciences, 2004, 44= , 1708-1718. - http://dx.doi.org/10.1021/ci0498719

[3] R.C. Glen, A. Bender, C.H. Arnby, L. Carlsson, S. Boyer and J. Smith= . Circular fingerprints: Flexible molecular descriptors with applications f= rom physical chemistry to ADME. IDrugs 2006, 9, 199-204. - http://www.biomedcentral.com/content/pdf/cd-653859.pd= f

[4] Nidhi, M. Glick, J. W. Davies and J. L. Jenkins. Prediction of Biolo= gical Targets for Compounds Using Multiple-Category Bayesian Models Trained= on Chemogenomics Databases. J. Chem. Inf. Model., 2006, 46, 1124=E2=80=931= 133. - http://pubs.acs.org/doi/abs/10.1021/ci0= 60003g

[5] Rogers and Hahn. Extended-Connectivity Fingerprints. J Chem Inf Mode= l. 2010 May 24;50(5):742-54.

------=_Part_258_940684427.1632427552239 Content-Type: application/octet-stream Content-Transfer-Encoding: base64 Content-Location: file:///C:/922b812b3224d9271bce944f2bca7b04 iVBORw0KGgoAAAANSUhEUgAAAJQAAABmCAYAAAAtZrjGAAAHx0lEQVR4Ae1biW4dNwxsivz/Fwdo Ow0mbx5N6tina9cU4EjizeF4vT7y45//1l+5EoFBCPw9KE6GSQT+RyAJlUQYikASaiicGexnQtCO wM+f73D9+vWr3fmbWL4j9E2avtImyGQJ5MmuxH6Sz4/8Lq8+zhJxSrp6ZN8CMbksiSk/dU9CVSbT QpgWm0qaP2oby97/GB56yJfyiYMBGbhw5gdl2CmjbfREol59Tzy/Oj6xuhvWpIMnOSDjGS3xzp1t 2jvl2OEPPZfGo+yE/VXhCdXcvIYSIbS1VjLYeOpndRp/5zkJNRD9kU8RjzCQcSm5KDthf1V4QjUP qEEH7ZECLUZytu/pPRntT9rzu7yGaZSGWdJpaNhxWdKpXO1UzvPpexKqcUIecTxZY7jHmiWhOkZr nx76pOkI82jTJNSjx7u+ufzB5nrMH50xCfXo8a5vLgnViLl9f4KbJ2sM91izJNRjR7unsSTUHtwf mzUJ9djR7mksCbUH98dmTUI9drR7GktC7cH9sVmTUI8d7Z7GklB7cH9s1iTUY0e7p7H85fAe3B+b 9fVXXwtb9H5lcYc/Bblr3QtH+9dyQmEoHnki+UowSrmi+iJ5KdaTdUu/5NXAr+l3DaJWV02PumHD 5X1CUXf3fRmhWkA/EcwRddsY9n5i31drOv67PICvC3d+UE4bK6d+1866oicS9bvqm5H3fVozMlyI qUDrMCCP7qqL/C+U0u3Skxu99Nh3F7PB4ThCKTEsHnYAqlei6bkUT/1HnFtyWZtdtY7o14txHKEs aUqAYzjeUrn6e7YjZaXakceSiTLWsLJW5hy9+xMZnaUzngLrDQHhlDQaPrJXm5nnqHavLk82s7YV sZd9l4dmagDW9IyBnU8D3SGfsWp1tehtXUo8q7vzfSqhPKA9GQCM5KvBjerola+u+5R8UwgF8LGi z0LqFYTIVm3o12Krfi3nlti00XgzatH4evbyQ7+yBq3HOw8lFBue3eDIPCNjWYBnxra5TrkPJdTq pjCwT8j7qX9rvzViUc94n/TEGLv2kFC2SRR4h0bvVjfqtbh6sl0E6c3rEipqKJL3Ju21R14sC7zG oY3K9FzyVbuV5xKeJd2sGhXDq3h9IVStkZp+VrOIy4a9ZiMd5fCP/Dw57L01qv+WOC02Xo1XZDaX vbfGfPvBZkuQCHz1xZmL9tRTRzntWnb6MBZ9SjHhY+3h5/lQBr3mwp2L8XCnDXWn7Nqv1xPqtPKo F43V0t9r8i3WxsYWBbUtQO/2zHBRM9Tb/RN71mxj2Nr0jvxqz3MUy9a74s5atFbbA+/cWZe9U85Y 0HOxd97t/rK0mso9KgIJtQANo8XoOYqlvrPPpbpn5/40fit+inkpp42nflZn41wmlB0Ak9qEuHtL 5fT17EbLmIv5vTt1UW7q6RvZrZKjDtaEnJ/UhTjWvye2P+1GJDRxrRAN6dmqfsWZtffW0mu/ohfk YD84RzVGcvhgeXpP9tva//ftu7xeZz/k64UPTTIm98jnEzliYymoGq81N+PQl/WXYtP2yl6qq6Tr yaU9KT5WrnfGV3vKavsboWBca6SmryWcpScgFgTKkdfqZtXSE9fD05P1xNxp+4VQKCZqKJLvbIC5 UVtpnUgm1mtrP7lW1hztLqFgbJuE7A6N3rVu4PuEFRLqCc1lD+sRCP8blf1Mt/f1pWbGOyAQEuoO xZdqzE+AEjrzdI8l1DzIMnIJgSRUCZ3UdSOQhOqGLB1KCCShSuikrhuBJFQ3ZOlQQiAJVUIndd0I JKG6IUuHEgJJqBI6qetGIAnVDVk6lBAICWV/EWzvpaCp+74I5C+Hv+/sp3Qe/hGR97uwOzyl7lr3 lOluCOoSCkPxyBPJN9Ttpozqi+RukBR+hMCXd6gS+CAZ9Ceuu9Z9Ipaf1PT2DlUayidJZvvete7Z uOyI/+UJdbUIfXLhzA/Go97Kqd+xsybmZm0q55k62ubuI/DR1y+CjdB854KMZ8j1bs/QY6n9b8m8 f72akU1rs3fVRf7zKr5X5MuEUpC15dJ7lhJHz1EsjTviXMpzct0jel8V4zKh7ABIEDs03L2lcvp6 diNlUc3IcXLdIzGYHcufdmNWJYIdCEIoaTSkZ6v60vkTX8St1QybGXUj7ndYby/l9jPYAyACm4Ni jNLgaevFj2SIF8VkzsgXcq9urYMxohyIofa45/qKwNuPDaiOQI3k9Juxkwgtw4zqi+Qz6v3uMV1C ARQOUgFqGaraf3q+QoQT6v607zv7h4TqberK8HtzpP35CAwjFFrl06H2JKOdwlPzUVv69/iof57n ITCUUCyzNHDoPCJEcsbEXoqrdnneh8AUQkXt1EhT0pd0Ub6Ur0dgGaGSEOuHuyPj28+hdhRQywki 6sKdH5TTxsqpz30dAu/TWpe3mIkEgZG+b0Ee3VUX+ReTpnIIAscRSolhOwSZlCyqV6LpuRRP/fM8 BoHjCGVJUyJHRC6Vq/8YyDJKCYHjCIVilQTRE0ZJow1G9mqT53kILHspt08eryWPJEouxiiRRu29 HCmbi8CyHxuwjYgMkZx+ud8DgeWEAiy1J9E9oMsqPQS2EMorJGXPQGDZO9Qz4MouaggkoWoIpb4L gSRUF1xpXEMgCVVDKPVdCCShuuBK4xoCSagaQqnvQuBf9sSOzLA3vpsAAAAASUVORK5CYII= ------=_Part_258_940684427.1632427552239--