There are over 5 million compounds that are available to purchase according to the meta service, E-molecules (http://www.emolecules.com).
It is worth exploring these in the context of the OSDD project as it will identify compound series' that are very easy to explore by purchasing analogues (i.e SAR by catalogue) aswell as identifying compounds that are potentially more sythetically accessible than others (i.e. if there many close neighbours these compounds might be easier to make than others).
To this end I have generate three chemical similarity maps, showing compounds that are very similar to known anti-malarial's that can be purchased.
1) Centered on ~40 compounds synthesized by OSDD:
This is particularly interesting, as there are many compounds similar to ZYH 3-1 that could be purchased exploring both ends of the molecule.
2) Centered on ~550 compounds from GSK priortized into ~40 compounds series
This is interesting as it clearly identifies high priority series that are easier to follow up on than others based on the premise that if there are many close analogs available, it should be cheaper/easier to follow up on these clusters compared to clusters with no close analogs available.For example, the paper highlights five clusters for follow on work. #31 has the most similar available compounds (~100 compounds), while #18 has the least, 1, which would suggest to me that it is much more efficient to follow up on the compounds in cluster #31 than any of the other high priority clusters. There are other very interesting clusters too, which perhaps have more interesting chemistry.
3) Centered on ~400 compounds in the Malaria open box
Likewise, certain compounds in the Malria open box have many more close neighbours than others which could help prioritize compounds for the community.
1) Does this type of visualization work for chemists? Is it straightforward to download cytoscape, install ChemViz and load the network?
2) What do we know about the SAR about these compounds? This would help to priortize/focus our search on chemical space. While I used ECFP_4 fingerprints, other similarity measures can prioritize other features differently (e.g. if we know a paticular substructure is key, then all compounds should contain it etc)
3) While I think the network view is great for a global overview of the compounds available (and can be overlayed with any other types of data that we can thing of, such as predicted targets etc), perhaps there is a better visualization for a smaller number of compounds?