Prediction of Biological Targets of Actives


Request for Help

One of the interesting features of the GSK set of antimalarial compounds that are acting as the starting point for this project is that they are whole-cell actives, meaning that though they are extremely promising hits, we don't know how they work - i.e. what the targets are. To some extent this doesn't matter - praziquantel has been used for over 30 years and nobody knows how it works. However, a combination of factors (ease of regulatory approval, the possibility of some rational drug design, sheer curiosity) means it would be nice to know what these antimalarials are actually doing. How to figure that out?

There are ways. One is to use predictive cheminformatics - to use a correlation of all the known drug vs. drug target matches that are known, and to extrapolate that model to a molecule of interest. This exact part of the open malaria project was in our original grant proposal as something with which the core team had no expertise and so was an area where we were going to have to appeal for help. One of the super nice extra features that such an approach can bring is to predict off-target effects, which can help make a drug more effective (for example in this tremendous paper).

Last week such help arrived. I was talking with John Overington and Iain Wallace from ChEMBL about uploading the data from our project to their database (about which more shortly). It was an extremely interesting conversation. Iain has an interest in target prediction. He'd already taken the most active compound from our last round of biological evaluation and run it through his system to predict the likely biological targets of the drug. The raw data are here. The outcome from this search were these possible targets:
1. Carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 1
2. Dihydroorotate dehydrogenase (DHODH) - MMV/GSK have run these assays, e.g. here.
3. SUMO-activating enzyme subunit 2
4. SUMO-activating enzyme subunit 1
5. Cyclin-dependent kinase 1

What can we do with this information? We can try to find someone willing to screen this compound against those targets directly, to see if they are really targets. Anyone running these assays?

Iain's method is described in the online lab book, but he says it's this in essence: "Basically, a naive bayes model is built to distinguish compounds that are known to bind a particular target in ChEMBL from all others. We repeat this procedure for ~1,300 targets creating a model for each and score a compound with each model. I then generate the reports for only malaria proteins."

Iain has also repeated this analysis for the entire malaria box. This is significantly awesome work. Does this, I wonder, change the perception of which series are of particular interest?

It's important to bear in mind that these are preliminary results, as with everything in an open source project, and should be taken as work in progress. Iain understands this and wants to make sure everyone else does. Iain also points out that similar approaches have been used to successfully to identify novel targets of FDA compounds (see here and here), and the Shoichet lab have a nice webserver that can used interactively.

The other way of doing target prediction is experimental. Iain mentioned a couple of guys that might be perfect for this - Corey Nislow who runs a yeast-based assay for target ID, and Andrew Emili who is developing a proteomic-based assay. They're both at the University of Toronto, along with Gary Bader, whom Iain also suggested we contact. I'll reach out to see if they're interested. Any advice on the best approach gratefully received - chances of success here? Favoured method for target ID?

Bioisosteric transformation maps


Request for Help

It is common in drug discovery to have a highly potent hit that has to be optimised to remove undesirable characteristics such as poor oral bioavailablity, metabolic stability or toxicity. In our case, we have a number of highly potent compounds that have quite a high LogP, which is considered a warning sign for both a promiscousity (i.e. binding to many compound targets in vitro) as well as poor oral bioavailabilty (as it breaks the Lipinski Rule of 5).
One approach to solving this issue is the concept of bioisosterism. From Wikipedia, "bioisosteres are substituents or groups with similar physical or chemical properties which produce broadly similar biological properties to a chemical compound. In drug design, the purpose of exchanging one bioisostere for another is to enhance the desired biological or physical properties of a compound without making significant changes in chemical structure."
One such example would be replacing a hydrogen with a flourine at a site of metabolic oxidation. "Because the fluorine atom is similar in size to the hydrogen atom the overall topology of the molecule is not significantly affected, leaving the desired biological activity unaffected. However, with a blocked pathway for metabolism, the drug candidate may have a longer half-life."
To that end I have created biosteric transformations using the Pipeline Pilot software programme for the compounds synthesized as part of this project. Two different approaches are implemented to generate the transformations:
1) Classic Biosteres involve transforming the original molecule based on a set of ~200 commonly used transfomrations, such as replacing a hydroxyl with a sufonamide. 
2) Database Biosteres involve transformations based on an algorithm described in this paper
"M. Wagener, J.P.M. Lommerse, “The Quest for Bioisomeric Replacements”, J. Chem. Info. Modeling, 2006, 46(2), 677- 685"  
Focusing on just ZYH-3-1, I generated two reports (one for both methods) showing about 20 compounds resulting from such transformations that would all have ALogP < 5. It would be intereting to know how easy they would be to synthesize, as well as what would make sense to make based on what we know about the SAR of these compounds.
All the reports and data are available

Compound Similarity Networks from Iain



More fantastic work from Iain Wallace of ChEMBL, below. These are maps of active antimalarials and predicted targets, expressed as similarity maps, i.e. with an extra level of analysis added on top. This provides a very intuitive way of walking through related compounds to compare structures. How best do we use this kind of analysis - as a target guide, or as a "prediction of what to make next" guide?

Iain says:
I have now predicted targets for all the anti-malarial active compounds in Chembl-NTD (~20,000).  I have a full report for all these compounds, but it is quite large (~90mbs and 1200 pages) so I have displayed the results as a compound similarity network (posted here). In this network, compounds are represented as nodes and very similar compounds are connected by edges. Nodes are coloured by their predicted top scoring target. The names of the compounds can be viewed by zooming in very far and opening the file in Illustrator.
A similar map for compounds similar to zyh-72 [and one of the starting "Near Neighbour" set TCMDC 123563] in this dataset is posted here.
Also [on the two pages linked above] are two original networks that were generated using cytoscape ( If you install the cytoscape plugin "ChemViz", (, you can right click a node and view the compound structure. You can also view the target of the compound.
I think networks are useful way of visualizing/integrating different types of information (for example and I would be interested to hear if you had any thoughts of how to make this type of visualization more useful. For example the similarity measure I am using may not be finding molecules that you would expect to find.

Purchasable compound similarity maps


Request for Help

There are over 5 million compounds that are available to purchase according to the meta service, E-molecules (
It is worth exploring these in the context of the OSDD project as it will identify compound series' that are very easy to explore by purchasing analogues (i.e SAR by catalogue) aswell as identifying compounds that are potentially more sythetically accessible than others (i.e. if there many close neighbours these compounds might be easier to make than others). 
To this end I have generate three chemical similarity maps, showing compounds that are very similar to known anti-malarial's that can be purchased.

Maps generated

1) Centered on ~40 compounds synthesized by OSDD:
This is particularly interesting, as there are many compounds similar to ZYH 3-1 that could be purchased exploring both ends of the molecule.
2) Centered on ~550 compounds from GSK priortized into ~40 compounds series
This is interesting as it clearly identifies high priority series that are easier to follow up on than others based on the premise that if there are many close analogs available, it should be cheaper/easier to follow up on these clusters compared to clusters with no close analogs available.For example, the paper highlights five clusters for follow on work. #31 has the most similar available compounds  (~100 compounds), while #18 has the least, 1, which would suggest to me that it is much more efficient to follow up on the compounds in cluster #31 than any of the other high priority clusters. There are other very interesting clusters too, which perhaps have more interesting chemistry.
3)  Centered on ~400 compounds in the Malaria open box
Likewise, certain compounds in the Malria open box have many more close neighbours than others which could help prioritize compounds for the community.

It would be great to get feedback on this approach, namely:

1) Does this type of visualization work for chemists? Is it straightforward to download cytoscape, install ChemViz and load the network?
2) What do we know about the SAR about these compounds? This would help to priortize/focus our search on chemical space. While I used ECFP_4 fingerprints, other similarity measures can prioritize other features differently (e.g. if we know a paticular substructure is key, then all compounds should contain it etc)
3) While I think the network view is great for a global overview of the compounds available (and can be overlayed with any other types of data that we can thing of, such as predicted targets etc), perhaps there is a better visualization for a smaller number of compounds?