Prediction of Biological Targets of Actives

Published by MatTodd on 19 March 2012 - 10:57am


Request for Help

One of the interesting features of the GSK set of antimalarial compounds that are acting as the starting point for this project is that they are whole-cell actives, meaning that though they are extremely promising hits, we don't know how they work - i.e. what the targets are. To some extent this doesn't matter - praziquantel has been used for over 30 years and nobody knows how it works. However, a combination of factors (ease of regulatory approval, the possibility of some rational drug design, sheer curiosity) means it would be nice to know what these antimalarials are actually doing. How to figure that out?

There are ways. One is to use predictive cheminformatics - to use a correlation of all the known drug vs. drug target matches that are known, and to extrapolate that model to a molecule of interest. This exact part of the open malaria project was in our original grant proposal as something with which the core team had no expertise and so was an area where we were going to have to appeal for help. One of the super nice extra features that such an approach can bring is to predict off-target effects, which can help make a drug more effective (for example in this tremendous paper).

Last week such help arrived. I was talking with John Overington and Iain Wallace from ChEMBL about uploading the data from our project to their database (about which more shortly). It was an extremely interesting conversation. Iain has an interest in target prediction. He'd already taken the most active compound from our last round of biological evaluation and run it through his system to predict the likely biological targets of the drug. The raw data are here. The outcome from this search were these possible targets:
1. Carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 1
2. Dihydroorotate dehydrogenase (DHODH) - MMV/GSK have run these assays, e.g. here.
3. SUMO-activating enzyme subunit 2
4. SUMO-activating enzyme subunit 1
5. Cyclin-dependent kinase 1

What can we do with this information? We can try to find someone willing to screen this compound against those targets directly, to see if they are really targets. Anyone running these assays?

Iain's method is described in the online lab book, but he says it's this in essence: "Basically, a naive bayes model is built to distinguish compounds that are known to bind a particular target in ChEMBL from all others. We repeat this procedure for ~1,300 targets creating a model for each and score a compound with each model. I then generate the reports for only malaria proteins."

Iain has also repeated this analysis for the entire malaria box. This is significantly awesome work. Does this, I wonder, change the perception of which series are of particular interest?

It's important to bear in mind that these are preliminary results, as with everything in an open source project, and should be taken as work in progress. Iain understands this and wants to make sure everyone else does. Iain also points out that similar approaches have been used to successfully to identify novel targets of FDA compounds (see here and here), and the Shoichet lab have a nice webserver that can used interactively.

The other way of doing target prediction is experimental. Iain mentioned a couple of guys that might be perfect for this - Corey Nislow who runs a yeast-based assay for target ID, and Andrew Emili who is developing a proteomic-based assay. They're both at the University of Toronto, along with Gary Bader, whom Iain also suggested we contact. I'll reach out to see if they're interested. Any advice on the best approach gratefully received - chances of success here? Favoured method for target ID?


Great write up, explained it much better than I could do. The other way this approach could be used would be to predict possible targets in humans, which would then be possible off-target effects that one might want to minimize during lead optimisation. As a matter of interest, have you done (or is anybody planning to do) synergy screening with this set of compounds? Eric Brown ( has some really interesting assays that can identify compound mechanism based on the pattern of synergy observed with compounds of known mode of action. (

MatTodd's picture

GSK Tres Cantos told us by email today that they'll try the DHODH inhibition assay on these compounds - so we'll test out your prediction Iain! Will be great to have these data.

That is great! I am working on making predictions for all of the active compounds from the 6 datasets on the Chembl-NTD site ( It might be useful to try and make a resource of who might have an assay set up for a particular target so that other predictions could get tested as well if a particular lab with expertise was keen.

cdsouthan's picture

As you may know projects such as the connectivity map  can, on a good day, infer and relate simillarities in mechanism of action based on the mRNA expression signature specific for a drug,  compound or other peturbogen.  Would it possible to do this with plasmodium ?  I found some references to plasomodial microarray work (  but RNAseq on NGS machines could also be a way to go.  Such an experiment is unlikely to directly finger the target off the bat but could be a useful classifier.  Also inhibitors with different  known moas would be good controls.

cdsouthan's picture

I’d like to surface a relationship that I literally stumbled across while ferreting thought the similarity matches for the current most potent lead  in PubChem (I intend to re-visit these searches  more systematically).  At a level of about ~ 0.7 Tanimoto OSM-S-39 has similarity to CID 1359423, that, via a ChemSpider-to-SureChem call-out, was extracted from US20040180889 claiming inhibitors of human PIN1 (Q13526).  However, there are two big caveats to any inferences that a plasmodium orthologue of PIN1 might be one of the targets for OSM-S-39.  The first is that BLAST doesn’t reveal any similarity scores indicative of homology, even though enzymes of the same class have been explored as antimalarial targets (Q8I4V8 and PMID 20572013).  The second is that the patent is devoid of SAR data (i.e. no links between structures and PIN1 IC50’s).  Thus, on balance I think this connectivity could be a red herring and is a weaker case than some patent matches already reported. However, I  thought it worth recording here because you never know which observations may have future relevance as this project explores many avenues to join things up.

MatTodd's picture

That's a nice spot, Chris. Definitely worth mentioning, particularly given your follow-up analysis re BLAST. Need to keep this on a side burner while we look at in vivo data and try to get some bio data on possible mech of action.