Current Projects for Malaria

Open source drug discovery project currently active - see GSK Arylpyrrole link below or this general intro page written at the start of the project and which lays out the Six Laws, or read about the current project status.


All malaria projects currently in process should be created as child pages to this page.  See the "add child" link at the bottom if you want to initiate a new open research project for malaria.

Note added June 30th 2011 - some projects listed below are currently inactive and are being gutted (still the case in November 2011 - could use a spring clean).

Functional Reconstitution of the Plasmodium GPI:protein transamidase Complex

GPI Saj.pdf168.3 KB

Proteins that have a GPI glycolipid modification are acknowledged to be key in the lifecycle of Plasmodium; for example the involvement of the GPI-anchored MSP1 on merozoites in erythrocyte invasion.  The enzyme that transfers the preformed GPI to the proteins such as MSP1 is named GPI:protein transamidase, however, studying this enzyme biochemically has been arduous, due to the following hurdles. Firstly, the GPI:protein transamidase functions as a subunit in multidomain complex, some components of which maybe membrane associated, and there are no reports of functional recombinant reconstitution of this complex.  Secondly, there are no convenient and sensitive transamidase assays available that are amenable to medium/high throughput studies, even though large small molecule cysteine peptidase inhibitor libraries exist and can be used to studying this enzyme.

Any input/thoughts on how to overcome the aforementioned obstacles in studying the Plasmodium GPI:protein transamidase would be most welcome. Please peruse the attached file.

Gene Wiki Overview

genewiki intro.doc504.5 KB


general open research

See the attached document for an overall concept of our Gene Wiki idea.

Please login and add comments on your thoughts on this project. Type of feedback we're looking for:

  1. Would this feature be useful to you for in-silico drug design? E.g. If you're a malaria scientist, is this useful for malaria drug target identification or should we focus our efforts on another genome?
  2. Are you interested in reaching out to world-wide resources such as India to help with this work?
  3. Would you use this feature openly if it were deployed on The Synaptic Leap? Please explain if you think it should be deployed somewhere else.
  4. What is your specialty and how would you use this feature? E.g. what contributions would you make to a community driven drug development?
  5. Which feature do you think is most useful?
  6. Are there important other features that you want added in the first or second version of the tool?

The more comments the better - even if they're negative. I want some online brainstorming.


Community-based Gene Annotation

This project is not yet active.

Gene annotation is essential for the advance of research in Malaria. Thus, we believe that allowing the users of the Malaria community in TSL to input (in a structured manner) their collective knowledge of a protein/gene can only benefit basic research towards drug discovery against Malaria. The aim of this project is to provide such tools in a flexible manner so they can be extended to different genomes in the future. When the project will be open (sometime in early 2006), all TSL registered users will be able to annotate genes/proteins from the malaria genome.

To contribute to the project you need to register with TSL.

Gene Cards for the Malaria genome

This project is not yet active.

Gene annotation is essential for the advance of research in Malaria. However, the annotation of genes is collected in several databases and sometimes (more than desired), associating the entries of different databases is no trivial. The aim of this project is to provide a tool that collects information from different databases for all genes in a genome and presents a summarized version of the collected information, which includes all relevant links to the original sources. Eventually, as the projects advances, we plan to maintain a wiki version of the Gene Cards so that users can modify its content.

The Sali group at UCSF has collected data from several annotated sources of the Malaria genome. As of September 2005, the Malaria genome had 5,270 ORFs.

The used sources were:

  • NCBI at
  • BioMart at
  • ModBase at

You will be able to search the database from this page after the project is released in early 2006.

Gene Wiki Programmers Needed


Request for Help

TDI and TSL are in need of volunteers for the tools development within the Malaria community. You can read about the projects themselves here. So, what we need? Basically, we seek programmers with an interest of being part of a new community for BioMedical research. You would be applying your skills to develop tools that that should help advance the research of drug discovery for neglected diseases. This is something that you will feel good about! Gives good Karma!! The skills we are looking for in a volunteer in those projects are: - JavaScripting (in particular bookmarklets) - PHP programmer - MySql database experience - Familiarity developing Wiki-based tools a strong plus. - Familiarity with Drupal a strong plus - Familiarity with Trac a strong plus - Familiarity with Subversion a strong plus If you see that you would like to join, please do so by posting a comment to this entry! Thanks! marc

Malaria Gene Basket

This project is in discussion.

Gene annotation is essential for the advance of research in Malaria. Thus, we believe that allowing the users of the Malaria community in TSL automatically "annotate" each gene/protein in the malaria genome, can only benefit basic research towards drug discovery against Malaria. The aim of this project is to provide such tools in a flexible manner so they can be extended to different genomes in the future.

As seen in other social communities for photos or bookmarks., TSL registered users will be able to save gene cards in their baskets and associate pieces of information or tags to entries in the basket. This will allow a semi-automatic association of data to particular genes in their baskets.

The mechanism will take advantage of the use of bookmarklets so that the user can add information to its basket with just one click on their browser.

For example, a user may be browsing the literature at PubMed and find an interesting article, with just one click the system should be able to propose and association between the article and any of the genes in his/her basket.

Target Selection for Structural Genomics of Malaria

This project is not yet active.

Structural genomics aims to structurally characterize most protein sequences by an efficient combination of experiment and modeling. Central to the success of these efforts is effective target selection. There are a variety of target selection schemes, ranging from focusing on only novel folds to selecting all proteins in a model genome. Many of the target selection strategies of the Structural Genomic Consortiums are biologically based, providing a set of protein targets that are key actors in an interesting biological process. This project aims to provide a flexible tool for general target selection (in this case of Structural Genomics) based on collective knowledge. TDI registered users can vote for genes/proteins that may be promissing candidates for structural determination. The aim of the project is to generate a list of target proteins, which structure may help the advance of drug discovery for malaria. Dr. Raymond Hui, from the Structural Genomics Consortium in Toronto, and Dr Marc A. Marti-Renom, a computational bioligist, from UCSF, will analyze the gene the community voted on to have the highest potential. Results from that analysis will be posted here as well as open-access databases such as PlasmoDB. To contribute to the project you need to register with TSL. We intend to release this project in the early month of 2006.

Open Source Drug Discovery for Malaria Meeting - February 2012 - Why?



In August 2011 my lab started an open source drug discovery for malaria project. We’ve been a) very excited about this idea, and b) very busy getting things started. It’s an unusual project, exciting not only because we might be able to change the world, but because it’s not clear how we might do it. To me, that’s the definition of good research – you not only don’t know whether it’s going to work, but you don’t even know how to go about doing it.
We had previously found the experience of open science exhilarating, in that the free sharing of all our data and ideas had accelerated a different research project. With new funding for open science we were able to start on drug discovery. Now there has been an Indian OSDD effort that has to date focussed primarily on TB bioinformatics, and who are now moving towards drug discovery (in TB and malaria about which more later). There is the newly-announced Transparency Life Sciences, and Cinderella Therapeutics who are making moves to open up the process of clinical development of compounds. There’s the broad biology-data perspective being championed so effectively by Sage. These are all wonderful, wonderful developments – I hope we are the start, together, of a very significant shift not only in drug development but generally in how we do things as human beings. But I’d felt there was still no basic project in open source drug discovery – something that put medchem and open source together to see what happened, and which started very much from the ground up – making molecules and getting people involved in something specific as a kernel. Tim Wells at the Medicines for Malaria Venture agreed, and we got going in August. We then received [MMV+Aussie Government] funding to 2015. To read what’s going on stay tuned to The Synaptic Leap, G+, OWW and Twitter since things are happening fast and we seriously need people to add momentum.
Over the course of 2011, when describing this project to people, no matter who it was, I would get very high levels of engagement. People wanted to know what osdd was, how it could possibly work, who would pay for it, what happens to intellectual property – a long list of questions. I had answers for some, not for others. I decided it would be a good idea to have a meeting, and one that had a loose agenda, since I have completely converted to unconferences after realising how stale regular conferences can be. I asked Sydney Uni for some money, which they gave me (thank you DVC International for the IPDF scheme), and I started setting up a one-day meeting on open source drug discovery for malaria – to ask in the most general sense whether it can be done. I asked a broad range of people to come and talk about what they thought was interesting. This really is the key to a meeting – you organise coffee, identify good people and say to them “Tell me what’s interesting.”
We met in Sydney on February 24th 2012 and had a fascinating day, then sunset beers in Glebe and dinner. The day was streamed live online. All the talks were recorded. We’re now posting them on YouTube and will annotate them to highlight key points - the links will appear below. If you’ve comments or suggestions please don’t hesitate to write on the relevant posts or on the original YouTube pages. I hope the talks are a way of focussing discussion on the most significant ideas.
The arguments about whether osdd is possible are surprising and astonishing. Not because they are radical, but because osdd is so simple, at heart. Of course as things currently stand osdd is extraordinarily difficult, which is why we’re doing what we’re doing and need people to join. Perhaps the most astonishing thing to realise is that patentless drug discovery has already been done, and is to some extent how things were done before the modern (post-war) conception of the big drug company. I had heard of the polio vaccine and penicillin stories a couple of days before at the AAAS meeting in Vancouver at a talk by Robert Cook-Deegan. Luigi Palombi told the story at the Sydney meeting. If you don’t know about this, or you think that patents are needed for drug discovery, do listen. It’s important to have a Healthy Disregard for the Impossible, but it’s a lot easier when you realise that the “impossible” was actually done a long time ago.
(A shout out to Paul Willis at MMV for coming to the meeting and being our pillar of support. Thanks too to all the amazing help setting up this meeting from my Axis of Open students Paul Ylioja, Kat Badiola, Murray Robertson and Jimmy Cronshaw)

Open Source Drug Discovery for Malaria Meeting - Opening Comments



Opening remarks at the Open Source Drug Discovery for Malaria meeting at The University of Sydney, February 24th 2012. Speaker is Dr Mat Todd, The University of Sydney.

a) The question to address is whether we can we do open drug discovery collaboratively, without patents?
b) This is a very different way of doing things, and therefore interesting in itself.
c) The shift of a lab book on a desk to one on the web is a basic one, but powerful because the internet allows unrestricted collaboration.
d) Open projects have no ownership, unlike the way we do science currently
e) Open projects have fluid composition and leadership
f) Sponsorship for this meeting came from the International Program Development Fund, The University of Sydney and the Sydney osdd project funding from the Medicines for Malaria Venture
g) Diverse participants in the meeting include Saman Habib from the OSDDm project based in India

Open Source Drug Discovery for Malaria Meeting - Session 1, Part 1, Mary O'Kane



Welcome address at the Open Source Drug Discovery for Malaria meeting at The University of Sydney, February 24th 2012. Speaker is Professor Mary O'Kane, Chief Scientist and Engineer, New South Wales, Australia.

Summary: O’Kane passionate about openness in science. It is important that we try to discover new ways of doing research. There have been important moves in recent years in open innovation (e.g. prizes for the solution of big problems). Open science is different since everything is shared. Australian productivity might benefit from these new ideas, and is already investing in key infrastructure projects to make it happen.


1) Professor O’Kane has been passionate about openness in science for a while. Inspired by delegate Richard Jefferson’s work (e.g. Cambia), and the impact that open access to data can have on government. Mentions GIPA (Government Information (Public Access) Act 2009), which allows the public broad access to government data by default


2) Open science for malaria is important for its social and economic impact.


3) Professor O’Kane is on the Board (and is currently, Chair) of Development Gateway (Washington, Brussels) which pushes for open source software for transparency in the developing world and tries to deal with the issue of how we get through barrier of software patents. One product is Zunia – about knowledge exchange via an open platform. Is there is a potential overlap with open source drug discovery?


4) Discussions at this meeting are part of wider question about how we do research. Open innovation and open science are different, but related. DARPA in US – using competition and teams to solve big problems. Produced the microwave oven and the internet (really?). Google Translate was apparently a three-time DARPA project; when the eventual winner was declared Google employed the whole team! Most recently ARPA-E was established in the field of energy. Works with teams for solving problems.


5) Kaggle in Australia works in this way, with data science. The NSW government used it for Sydney's M4 freeway - wanted algorithm that predicts travel times. They supplied 18 months of data and offered $10K as a prize. The produced 364 proposals over 6 weeks, and the teams could see what the other team were doing (very interesting if that’s the case). Won by someone in the US, who generated an algorithm that would have cost the government $1M to build themselves. Science Exchange works by allowing people to subcontract research. There are commercial ventures such as Innocentive and Nine-Sigma.


6) Open science is different – since one shares everything. Science commons, creative commons, neurocommons are related to this. All part of exciting new ways of doing things. Innovation like this is particularly important for Australia, which has a good economy but has in recent years no productivity growth. Open science might boost productivity and improve how Australia works with other countries.


7) Open science needs significant new infrastructure for the management of big data. A mixture of human and automated platforms are needed. NICTA in Australia is key to this, and there should be interactions with this group. Intersect and ANDS also important components. We must seek ways of building eResearch platforms to support open science.


8) The University of Sydney has another resource – Michael Spence, the VC, who was very much in the open commons area in his former life.


Open Source Drug Discovery for Malaria Meeting - Session 1, Part 2, Mary Moran



Part of the first session at the Open Source Drug Discovery for Malaria meeting at The University of Sydney, February 24th 2012. Speaker is Dr Mary Moran, Policy Cures.




1) Patents have big benefits but also big drawbacks. The monopoly a patent brings is useful for generating money for R&D investment. That companies see this as attractive has the knock-on that governments don’t have to pay for R&D, and so companies (rather than governments) take on the risk. The downside of the commercial/patent route is that control of the research agenda is set by the people paying for the research (companies and their shareholders). This means there is little incentive for collaboration.


2) (02:50) However, this patent situation doesn’t apply in NTD research since there is no market because there’s no profit margin; i.e. don’t have the benefits or drawbacks. IP in this area is a “ghost”: not a source of money or an incentive. Companies are more happy about giving IP away that is commercially worthless, and we are seeing this happening more and more in NTD research. e.g. WIPO has a database of available IP for NTDs. Patent pools are becoming more common. Screening in collaboration with companies is getting better/easier in the sense of being more collaborative and hence faster.


3) (05:50) Excluding NIH funding, about 40% of R&D in NTDs is collaborative via Product Development Partnerships (becoming the norm). Not open source but still collaborative on same IP, i.e. small teams, working internationally. e.g. GSK – Tres Cantos as the example (very relevant to this discussion).


4) (07:00) The bad thing about NTD R&D IP being worthless: it’s difficult to fund the work. So the question is can we capitalise on the fact that the patents are worthless? $3 Bn a year now put into making NTD products, all of which is raised outside the market. The money comes from governments ($2 Bn), philanthropists ($0.5 Bn, mainly Gates) and companies ($0.5 Bn). Half a billion of the total goes to malaria product development, and of that 200-300 million a year is invested in malaria drugs. That’s for development of new products (R&D funding) up to the point of registration. (Detailed information is available via Policy Cures G-Finder reports.)

Question from Stuart Ralph at 10 min concerns whether the money discussed is for discovery or late-stage development: [audio unclear]

There have been 300 new NTD products in the development pipeline since 2000, 20 registered, 23 in final stages of trials. Of those, 7 new malaria drugs were registered, 5 of which are “useful”.


5) (11:30) Generally there has been an important shift in mindset about the issue of social responsibility – that NTDs are a public problem arising from a market failure. Companies can contribute, but NTDs are not at heart their problem. Australia puts in less money to product development for NTDs, less than any other “investor” country. Stuart Ralph Question (audio slightly unclear) (12:50): What’s the distinction between basic R&D and product development for the investment numbers. Moran – no distinction, grouped together in this analysis (discovery through to registration). There has recently been a review of Australian funding of NTD R&D, with the recommendation being that e.g. AusAID needs to start funding this area. Typically money has come from overseas to fund NTD R&D in Australia.


6) (14:00) Not having patents gives some freedom: funders can control the whole process. The research is usually done in a not-for-profit way, but the funder takes over the risk. The great benefit: increased collaboration. But the cost savings are quite minor. Discovery is cheap, but has a significant failure rate. The open source impact would be to: i) reduce failure rates by ii) providing better ideas up front and faster through sharing data. Though this is good, it is not a big saving because the R&D is inherently quite cheap. Preclinical research is more expensive, but the open source impact here is less, since you at this stage one needs to share work (rather than data) which is harder and there is less to gained from parallelization, and the work needs to be done in any case. Clinical trials for NTDs (costing e.g. $15-20 M) have high failure rates (3 in 4 early on, then 1 in 4 later). Governments are particularly uneasy about failure rates in drug discovery. Can open source make this more efficient? So open source the easiest to apply in the NTD area because IP is not an issue, and needs to be done. But it also has the least additional benefit because people are already collaborating. But it’s a pilot for the big question for the future: application of open source to commercial areas, which is more difficult, but potentially more impactful. Companies know they can’t keep doing things as they currently are. Keeping the process open as long as possible would reduce cost and risk. NTD R&D is the testing ground for this.


[Questions 21-22 minutes have audio issues. One from Hazel Moir about percentage of funding going to Phase III. One from Stuart Ralph is unclear.]


Not publishing data is a problem in the field. Policy Cures releases their G-Finder info freely, and they receive a lot feedback on that. Efficient. Avoids duplication of effort.


7) (23:50) Question from Mat Todd: [problem with audio] The basic impact of open source in for-profit areas: with open release of data, and with many more eyeballs on the problem, people can spot problems with data earlier on, which helps to reduce time wasted.

Moran: agrees, that if you have more people looking at things, you will improve the work, particularly because there will be critical eyes. As she says “When you’re looking at your own baby it’s always beautiful”. Drug discovery is still fundamentally a risky field, even when the molecules look great late in the process. Early failure saves money.


8) (26:28) Question from Hazel Moir (remote) about US orphan drugs act. [Audio problem] Moran: Orphan drugs are not commercially interesting, so market monopolies are of no use. There is a possibility of reciprocal orphan drug arrangements – for e.g. malaria and TB. But the problem with NTDs, is that nobody in the developed world really has them.


9) (28:50) Question from Paul Willis [Audio Problem]. Moran: Infectious disease is easier to make drugs for because there’s an endpoint. Not so for other problems like high blood pressure. Prior to 2000 there was very little research into drugs for NTDs. So we don’t really know attrition rates yet for the most recent work in the area.


10) (31:40) Richard Jefferson Q: Why are we so obsessed with drugs? The US solved many of these diseases with public health interventions. Why aren’t we focussed on those? Moran: Not sure. Drugs are simple. People reach for tablets/vaccines rather than intervention. Jefferson: Are we doing this because we’re too comfortable with the process? Whereas we need to be honest about whether the research is going to give the desired outcome. Moran: one still feels the strong need to do something proactive when you’re doing the public health R&D. Drugs can do that. Jefferson: What’s the timeline of the two approaches? Say we have a hugely promising lead in the lab. What’s the timeline for that to become a viable treatment vs. an intervention timeline e.g. bednets. Moran: It doesn’t have to be an either/or debate. Todd: public health challenges are daunting, e.g. for schisto, where there is no equivalent of bednets because the parasite is in fresh water, so the availability of a drug allows us to do a lot in the meantime.

Open Source Drug Discovery for Malaria Meeting - Session 1, Part 3, Luigi Palombi



Part of the first session at the Open Source Drug Discovery for Malaria meeting at The University of Sydney, February 24th 2012. Speaker is Dr Luigi Palombi, ANU.




  1. The patent system. There is much mythology about the patent system, mainly that it’s needed for a) drug development and b) innovation. Reality is: not necessarily the case.


  1. (01:29) What happened before the patents system? 1900 was approximate date of birth of patent system, 1883 for the UK. German national patent law passed around 1877. WIPO didn’t exist before 1900. The needs of the people (rather than patents) motivated people to develop medicines. Switzerland didn’t have a patent law until 1907. Novartis and Roche began before the patent system. Germany: 1969 changed law to allow patents on pharmaceutical substances. In most of the EU it wasn’t possible to patent a drug until 1978 when European Patent Convention came into effect. In England – 1919-1949 patents were prohibited on pharmaceutical substances. Yet we have modern medicines.


  1. (4:00) Penicillin was developed without a patent. 1927 - Fleming notes a bacterium-killing substance. Florey later (1938) noted this observation and asked How to turn this observation into a medicine? The team worked in Oxford 1938-1941 without much funding to develop the drug for humans. This was a huge breakthrough for the war effort. The British government found it hard to convince industry to manufacture the drug. Government built a taxpayer-funded factory for this purpose. There was a patent for fermentation production eventually, but generally this was patentless drug development. Perhaps the patent system is applicable in certain areas, but modern pharma has become reliant on the patent model. Is it the way to solve the world’s most important diseases?


  1. (8:00) What is open source? An excuse to get around the patent and copyright systems? Grew from software. In drug development – can this work? Should it really be needed? Ought we to have IP over everything in the drug discovery process? There are patents on drugs as well as research tools. There are patents on naturally-occurring biological materials. Ought there to even be patents on these things? The scope of patentable subject matter has grown. It has become more difficult for scientists to share information openly. This has influenced universities.


  1. (10:10) 1980 - Bayh-Dole in the US. There were great examples where Universities were successful with exploiting patents. Stanford University and Cohen/Boyer breakthrough was a successful act of University patenting, prior to B-D. Generally the BD act has failed. Data suggests Unis are not good at promoting their IP. But the perception has grown that Unis need to create IP like a commercial entity. Now Unis are draconian in ways they implement their IP policies. Once the paragons of sharing and learning have joined the mythology of the patent system.


  1. (11:56) For 3 years politicians and Cancer Council Australia and others have been trying to scale back level of patenting – particularly natural biological materials. Draft legislation is that natural substances are not patentable because they are not inventions. Many universties opposed, so did WEHI/Garvan, which are taxpayer-funded institutes. Many individuals also opposed it. Much philanthropic research funding comes with strings. We are creating mandates and barriers to research and therefore trying to invent open source to get round them. Do we need to go back in time?


  1. (15:27) Polio Vaccine. Not patented – Salk said “Could you patent the sun?” Scientists have become part of the problem because they have succumbed to the myth that patents are required for research funding.


  1. (17:00) Question from Nico Adams (CSIRO). Government-funded, but also funded by outcomes from IP protection. How does one go back to management and say that patents are not necessary? Palombi: level of funding of CSIRO has decreased. Governments have become open to the idea that taxpayers money shouldn’t be spent on research. CSIRO is reliant on public money. Solution – eliminate PPP’s and fund research fully. That is a policy/political decision. Military/auto spending is significant. Since 1980 – is it actually producing what was hoped. The BD act has not been a success. Universities are not good at converting IP to income. There should, by now, be data on this, rather than assumptions about effectiveness.


  1. [Audio problem] (22:18) Moran: Yes, we can do R&D without patents. Can you raise money without patents? It is hard to persuade governments to fund R&D. Governments will always pay for everything in the end, ultimately. Governments prefer risk taken by others initially. So given how difficult it is, how else can we raise money to fund research properly?


  1. [Audio quiet] Question from Paul Willis about counterfeiting drugs. (24:35) Drugs of poor quality are a problem, and a difficult issue. Solution is that we require better regulation. Comment from Saman Habib – poor drugs will occur with or without the patent system. Palombi: There is an international anti-counterfeiting trade agreement, but a confusion in the terminology has arisen about counterfeiting vs. patent violations. There was a warning about Avastin from Genentech about counterfeiting. This is not a patent infringement, but is instead criminal behaviour which can be policed, outside the patent system.


  1. [28:10] Online question: How will the Myriad patent case affect the possibility of patenting genetic material. Are isolated (BRCA1) genetic mutations patentable subject matter? Court decision will be interesting, but possibly not very relevant to Aus because there may soon be a redefinition of patentable subject matter after a recommendation from the Australian Law Reform Commission. If that doesn’t happen, then the Myriad case might have an impact on whether natural biological materials can be patented. Australia may pass the Raising the Bar bill, which will introduce a research exemption to the Patents Act, but will not influence the scope of what may be patented. It’s unclear what will, generally, happen after that.

Open Source Drug Discovery for Malaria Meeting - Session 2, Part 1, Richard Jefferson



Part of the second session at the Open Source Drug Discovery for Malaria meeting at The University of Sydney, February 24th 2012. Speaker is Richard Jefferson, Cambia and the Institute of Open Innovation.




  1. set up about 20 years go with the aim of shifting the demographic of problem solving. The tools built were initially scientific – i.e. enabling technologies. This was an academic success but a practical failure. In an innovation system the rate limiting step is conversion of an idea to a solution.
  2. (2:12) The most important science is the science that fails but fails gracefully, since it provokes the most change. Cambia led to a biological open source movement to try to achieve more collaborative approaches. The latest moves are towards destroying information asymmetry, i.e. to de-risk innovation.
  3. (4:39) Innovation cartography. The power of an evidence base for self-interest. The analogy of trade. For trade one needs a map to reduce risk. The story of the Portugese/Spanish dominance of trade, owing to a very substantial investment in cartography. Maps identify risks. The quality of a map depends on the quality of surveying. The monopoly on maps gave rise to low levels of competition between powers. 1596 – a Dutchman was working in Goa, found the entire stock of Portugese maps. Copied them, went back to Netherlands and published them, open access. This gave rise to an explosion of activity, e.g. the founding of both the Dutch and English East India Companies, i.e. the publication of de-risking tools gave rise to huge commercial activity and innovation in ship building, and in associated areas such as insurance. Knowledge space is the key to business in the modern age, rather than moving commodities. The same metaphors apply. To bring the analogy back – those with money employ patent attorneys and business professionals who are gatekeepers to information. The have risks and expenditures. They recoup the expenses by targeting big innovations, not the small. We need a social revolution to democratize innovation.
  4. (12:25) Jefferson therefore built the Patent Lens. A transparent and inclusive innovation system. Patents are not the problem – they are part of the solution, as a great resource of our species’ technical knowledge. Patents are challenging to read, but still valuable. They are information, but not knowledge, which is more the aim of the Lens – how do we improve how we use patents? Solving a problem is like doing a jigsaw. The solution must be: A) Visualizable - Most important part of a jigsaw is the box – which shows the basic idea. B) Comprehensive – a jigsaw contains all the pieces and can be completed. C) Bounded – has corners and edges. D) Standardizable the pieces only come in certain shapes. These make innovation work. Currently we don’t provide these four requirements in many cases by default.
  5. (18:00) Patents are a right to sue, not a right to do. 80-100 million patents exist. They are public documents and a huge resource. To understand them we need survey points. In journalism this is the Who What When Where Why – these apply also to patents and innovation. The Lens added a Which. The front page of a patent typically has all this information. The Lens is a prototype – beta at the moment – but currently has 80 million patents in it.
  6. (21:13) Demonstration of the Patent Lens. Patents can be filtered easily by e.g. jurisdiction. Much of patent language is human-impenetrable. Need to allow automatic understanding of the text, allowing links between patents. Also need to be able to make links between patents and the academic literature.
  7. (26:30) How the lens facilitates collaboration. Can integrate patents with people. Can annotate patents and share them; can generate collections for projects. Can embed analyses in any other pages. Allows using patents to provide data to support assertions (or not).
  8. (29:28) Used this for Gates and malaria. Gates insist on a global access plan – how it’s clear that your work will find its way to the people who need it? Need to show impediments to delivery/partnerships appropriate. Recent appointment at Gates of new head of global development. Lens developed with agile development, here a pharma patent attorney working with software engineers, and this team was asked to develop a patent landscape for malaria vaccines. So currently has all candidates, with relevant patents, and human-enriched information. Is using an old content management system, and will have more tools soon. Cartography analogy again, that people focus on their local area of interest, and the broader map is built in aggregate. Comment made that most patents that could be enforced are not. They are a valuable resource, in aggregate.
  9. (35:44) Lens not yet live, so there will be bugs. Moving to Chinese, Korean and Japanese patents. Working generally with NCBI and Crossref, then want to move to the business literature, i.e. description of legal entities. The overall goal of the Lens is the removal of barriers to other people’s creativity.
  10. (38:10) Two questions from Mat Todd: 1) will the Lens include chemical structure searching? Jefferson: Yes, but technically challenging. Todd: Would be great if the innovation landscape around molecules could be visualized. Second Q: If we work on an antimalarial, and find that it’s a class of compounds covered by a patent, what do we do? Forbidden to research it? Palombi: Research exemption (low audio) but the clause is too narrow, because of the vagueness of the definition of “research” – there is little research not linked to anything commercial. Overall answer is that generally yes you’re allowed to research something that occupies the same area as an existing patent. The onus would be on the company to find us and ask us to stop, but there is so little incentive for them to do that. Patent infringement of this kind would apply to nearly every university. It actually could help the patent holders by making their methodology clearly more robust. The Lens is open source project. Can be licenced and internalised in companies, sure, but in return the community gets a search tool. Hence worth funding.
  11. (43:41) (Audio low) Observation from Palombi that it’s possible in future that infringing patents may become criminalizable, i.e. to ratchet up enforcement.
  12. (44:18) Jefferson demonstrates the Lens’ biological sequence tool. Sequence information listed associated with species, and which have been associated with patents. Valuable genetic resource for building an evidence base for policy.
  13. (47:00) Nico Adams (CSIRO) Chemical search tools have been developed by Peter Murray-Rust’s group. Chemical patents generally are not well-written. A particular problem is the intentional vagueness of chemical structures. Even if a patent can be read, it then needs to be understood. Jefferson: Mention of Surechem’s patent search tool. Many companies don’t need proprietary tools (don’t have resources to make them very good), they need proprietary outcomes (better information). Most companies frustrated by poor patent, poor tools available. “Companies are either doers or selective deniers”. NCBI can’t (at the moment) do things that are too disruptive in this area.
  14. (50:37) Question from Stuart Ralph. Patents generally awful but they do have a structured and limited vocabulary. Hence need natural language processing? Jefferson: Yes. Patent claims need to be translated into human language.
  15. (53:52) Moran, and others, questions (but audio unclear until 55:10). Nico Adams – Document summarization is important/relevant, e.g. Stephen Wam (CSIRO) interested in summarizing claims in science papers.
  16. (56:15) Final question from Luigi Palombi. Current negotiations are happening for the trans-pacific partnership agreement for the pacific rim, re IP. The Lens makes it easier for people to access information and is therefore relevant. People can’t currently get this information through government-funded patent sites. Should get in touch with e.g. DFAT negotiators. Jefferson: Yes, can become involved in these things when the Lens is ready (beyond beta), i.e. only when it’s comprehensive then it can be wedded to policy making. For example it needs to be at the point where it can understand patent claims.

Open Source Drug Discovery for Malaria Project Meetings



Project meetings for the Open Source Drug Discovery for Malaria project will be held in the public domain. Meetings will be linked below, with uploads to YouTube. Open to all interested.

Open Source Drug Discovery for Malaria Project Meeting February 26th 2013




Minutes of the OSDDmalaria Online Project Meeting held 26th February 2013
Meeting recording is here.
1. Status of the Arylpyrrole Series
The ether analog had been an obvious analog to make, since it lacks the problematic ester of the hit compound. The analog caused continuing synthetic difficulties (see e.g. here). Various people had provided useful inputs to synthetic design, in particular Frederick DeRoose at Asclepia. During the meeting it was decided to abandon this compound. Anyone reading this who feels they can make it - please give this your best shot! There may be a fundamental issue with this compound's stability, or at least the alcohol precursor of the compound. Either way, for the moment it's dead.
The only other remaining compound of interest was the sulfonamide OSM-S-XX (does this have a number?). The N-methyl analog of this compound has, since the meeting, been made and will be sent for biological evaluation. The method derives from a collaboration between Edinburgh student Patrick Thomson and Sydney student Matin Dean.
2. Status of the Near Neighbour Series
Of the near neighbor compounds so far evaluated, there was a fairly strong correlation between potency and logP. Of the new compounds made, the two most potent compounds (OSM-S-109 and OSM-S-111) had logP values that were not significantly lower than those compounds previously made. The proposed compounds with even lower logP values had proven to be synthetically challenging.
A representative compound (OSM-S-35) had been evaluated by Sue Charman in a glutathione trapping experiment, and there was divided opinion on the results. While the peak areas for Glu-trapped compounds were low, peak areas can sometimes be misleading in such experiments. Nevertheless it was thought the results did not present a red flag for the series.
The two active compounds of remaining interest were:
1. OSM-S-109. This contains an aromatic nitro group, known to cause (usually) genotoxicity issues. It was the intention to reduce this nitro group to the amine, and this should be done if it is trivial, then the potency of this amine should be evaluated. It was thought the amine would be as good as any other compound to evaluate, rather than, for example, acylating the compound first.
2. OSM-S-111, the methoxy-substituted compound. This looks to be a promising compound, and it would be useful to evaluate in vivo. Paul Willis was going to check if there might be an available slot at the Swiss TPH for this, for which about 30 mg would be needed. The experiments would be a Peters Test, in other words 4 days dosing at 50 mg per kg for 4 days, and the taking of blood samples. In the meantime, it was thought a good idea to obtain metabolic data on this compound: i.e. intrinsic clearance in microsomes and solubility. There was no need for another glutathione trapping experiment. Sue Charman would be approached for her availability to do these experiments.
In general, these experiments would represent the remaining experiments for the arylpyrrole/near neighbor paper, at which point these series should be published. A clear description of the reasons for the "kill" decision should be stated. These notes help there, but another post making clear the reasons would be needed. The current draft of the paper was to be resurrected (is here) and Alice Williamson was to be tasked with working on the experimental section.
3. New Thienopyrimidine Series
OSM-S-106 had shown excellent potency in the most recent round of evaluation, indicating the series was an attractive new option. 
At this early stage it makes sense to assess solubility, PK and liver-gametocyte assays - Sue and Vicky for this? This hit compound probably needs resynthesis.
The hit compound is a primary sulfonamide. Such motifs can possess undesirable biological activities, such as carbonic anhydrase inhibition. It was felt desirable to make also secondary and tertiary (methyl) sulfonamides, and this should be incorporated into any short-term synthetic plan.
A shortlist should be drawn up of the compounds in this series seen as either desirable and/or do-able with local synthetic resources. (This has since been done). Paul, Chris Southan and others should be alerted to this list, both to assess suitability and to check for related, available compounds, possibly via passing of the list to the GSK team. At this stage we are looking for maximal potency with minimal logP. Carrying out this consultation in public may help with identifying people able to help with synthesis, as for the last arylpyrrole "Top 10" consultation.
Synthetic Chemistry of the Thienopyrimidine series:
1. Suzuki. The coupling of various bromothienopyrimidines with boronic acids and esters is proving to be much more difficult than anticipated. There is little precedence for this coupling, but it is not clear why the reactions so rarely generate the desired products or what can be done about it. A summary of successes and failures will be constructed (has now been started here) and an appeal will be sent to the community for help with this step.
2. CRO input to starting material synthesis. It would be a time saving if the central starting material for this synthesis, chlorothienopyrimidine, could be provided in bulk to the project from the CRO sector. The synthesis is well-behaved and low risk. Representative procedures will be posted (now here and here) and advice from the CRO community will be sought.
Project Tools
1. Knime: There had been some discussion of automating workflows using Knime. There was little expertise in the group with this, but advice from others was being sought as a way of making the compound data in the project more usable.
2. The lack of notifications on ELN entries was still seen as a major problem, since this prevented the ELN being used as a discussion area, or as a place for summaries. It was felt by some that there was an increasing number of pages in the project generally, which was requiring an increasing amount of time for curation. The landing page was also seen as a major need. Mat was in touch with Intersect NSW about a quote for such a page and would pursue this. 
3. Continuing problems were seen with The Synaptic Leap platform, mainly with regards broken notifications and difficulties in making posts. While the name recognition of the site helped with knowledge of the project, the barrier to entry represented by the Drupal platform behind the Synaptic Leap webpage was seen as a continuing problem. (In a subsequent meeting at the Open Data Institute on March 4th, it was suggested to Mat that we should look into Github itself as a project coordination site.)
4. Having one SD file as the master list of all compounds was seen as being an attractive way forward. The location of the file was not thought to be important, but was currently on Github.
1. The last meeting was attended by someone from Roche, and this person should be recontacted as a possible source of project input.
2. The next meeting should be at the end of April, in order to allow for a couple of rounds of synthesis and evaluation on the thienopyrimidine set.
3. The osddmalaria data have been posted to ChEMBL here. It will be important to try to integrate the SD file with a continual import of the data to ChEMBL, but this is not currently simple. Discussion with George and John at ChEMBL to continue about how to do this.
1. Have the aryl pyrrole sulfonamide biologically evaluated (Patrick Thomson, but Mat and Paul Willis to organize UK lab). Resynthesise this compound to check method (Matin).
2. Reduce OSM-S-109 to the amine and evaluate potency (Murray).
3. For the methoxy-substituted near neighbor compound (OSM-S-111), check whether Swiss TPH has an available slot for this to be evaluated in vivo (Paul). Check available stocks of this compound (thought to be 30 mg) (Murray). Sue Charman to be approached for metabolic data (Mat).
4. Summary for the reasons for the "stops" on the arylpyrrole and the near neighbor set needs to be written (Mat). Work to be continued on the experimental section of the paper (Alice).
5. Check on need to resynthesise OSM-S-106 (Murray). Sue Charman to be asked about possibility of basic metabolic stability/solubility (Mat). Vicky to be asked about possibility of late stage gametocyte assays (Mat).
6. Post shortlist of thienopyrimidine derivatives with SMILES and request input from community (Murray and Mat).
7. Post summary of Suzuki reactions trialled to date in the thienopyrimidine series (Murray and Althea). Request input from community (Mat).
8. Post procedures for synthesis of chlorothienopyrimidine (Murray and Althea), and begin to explore CRO sector involvement (Mat and Paul).
9. Explore use of Knime with project data (Murray with online help).
10. Obtain quote for development of project landing page (Mat).
11. Explore whether github could be used in a project coordination role (Mat to request input from developer community).
12. Re-contact previous attendee from Roche (Alice).
13. Misc: Post meeting recording (Mat). Done.
14. Misc: Explore involvement of undergraduate communities in the synthesis of compounds (Mat).
15. Misc: Re-contact Mike Pollastri for possible involvement (Mat).
16. Misc: Consider new sources of possible synthetic chemistry input on thienopyrimidine series (all).

Open Source Drug Discovery for Malaria Project Meeting Oct 19th 2012



This meeting was the first of what will become regular monthly meetings for the OSDD malaria project. Open to all, so please feel free to come along to them.

Wiki pages:

Software for the meetings is Adobe Connect run from the University of Sydney.

Webcam feeds are from Mat Todd, Paul Willis (MMV) and Sanjay Batra (CDRI). Other participants present and contributing by audio or chat.

In case the embed functionality is not working, here's the direct link to the video:



The compounds discussed in this meeting are shown here:

Recent biological data

Compounds currently being made

Meeting - October 19th 2012

Items discussed:

Agenda Item 1:
New preliminary biological data from Vicky Avery:


Indicates a very high sensitivity to changing the ester of the first hit TCMDC 123812 (OSM-S-5). Lack of activity from others is interesting. Might support prodrug hypothesis. But the obvious half (carboxylic acid) was inactive.

Compounds of remaining synthetic interest.

Question arose about whether controls were included in Vicky's data. Checked later: yes, controls were included.

Paul Willis suggested a compound that was more truncated:


This has now been made and sent for evaluation.

Sanjay Batra suggested a urea replacement in place of the amide.

We will wait for final data from Avery before planning other compounds.

Agenda Item 2
Synthesis of Remaining Compounds

Triazolourea series: initial compounds have been made and evaluated too (data link above) showing that the hit compound and a precursor analog are active. There is a concern that these are acylating agents. Have chemists seen any enhanced lability of these compounds? Comments welcome on these structures.

Initial GSK screen for TCMDC 134395 (OSM-S-56): 767 nM vs 3D7. These new data have slightly better numbers (e.g. 333 nM).

Commercial compounds: On their way from Molport (have since arrived and been sent on). Thienopyrimidines to be coupled with compounds from arylpyrrole series and others. See the relevant ELN page.

Arylpyrrole compounds still to make (main page)

sA - now found to be inactive
sB - in progress by Paul Y - see e.g. PMY 60-4
sC - amino acid derivs - parked for the moment until we have data on the other compounds, to make sure it's worth it.
sD - Ether linked compound. Awkward to make. Not clear why.

Relevant attempts

This compound could be of interest to CROs. Frederik Deroose from Asclepia was present in the meeting and expressed interest. Depends entirely on timing and cost. Paul Willis was skeptical on the need for the compound if it was going to take a lot of input. This was followed up subsequently between Frederick and Mat, and we're now deciding whether to contract out the synthesis of this compound. We could send 500 mg of the aldehyde or grams of the precursor to the aldehyde.

sE - sulfonamide. Matin still pursuing this. Will be following (by NMR) the subsequent functional group manipulations.
sF - oxadiazole - ugly EDC coupling involving a hydrazine. Isomer hopefully to be made by student working with John Wallis.
sG - Matt Tarnowski's compound - nearly there. Murray will be stepping in to finish this off most likely. Matt successfully made a precursor

sH - Couple of these compounds now made and included in new compounds sent to Vicky.
sI - Fused ring compounds - now generating ideas for the central ring. Whether this compound is made depends on what's quick and available.
sJ - done and sent to Vicky Avery October 27th.

One of Sanjay's students making one of the pyrazoles, but the group is waiting for delivery of starting materials. Timeframe needs to be 2-3 weeks for synthesis.

Arylpyrrole completion:
Want final data from Vicky by next meeting (i.e. late November)

Agenda Item 3
CRO inputs. Working on this. Background

To date approached 7-8, but with little interest. Continuing.

There has been useful input from Kevin Lustig at Assay Depot, who has commented on the above post. Possibly a major push on CROs if we start on a new series.

Agenda Item 4
The thienopyrimidine series has not been completed. Jimmy Cronshaw soon finishing his Honours project but will not have time to complete this molecule. Will probably bring in someone else to finish the synthesis. Will depend on activity of commercial compounds that have been sent to Avery on October 27th.

Agenda Item 5
Current weaknesses of ELN and Synaptic Leap websites. Both require work. Some paid, some volunteer. This work needs to be done ASAP (probably over the long break). e.g. alerts for the ELN, and a landing page for TSL. Possible level of interest from people working Ginger Taylor, TBC.
Willis: landing page, needs a simple project description and getting up to speed. How does one do that? To catch people coming along and conveying project status. Mike suggested progress bars for sub-goals. A need for a separate landing page that's highly visual.

This aspect of the project needs significant input from non-scientists.

Agenda item 6


- Need to recontact Frey's team at Labtrove, and migrate data over to Nectar.
- Paul Willis to look into possible other series, in case we park the arylpyrroles. Want series that other people are not working on. Anyone else can suggest series. Paul and Mike did some filtering - which we should resurrect.
- To do: recontact Nislow lab for biodata
- To do: recontact GSK for DHODH assay

Not discussed:

First malaria paper being written, and is now posted on Github, which we'll be using more, and which will require a little bit of coordination/instruction. It's possible that there could be a split of this paper into two, since the new emphasis on the arylpyrroles means the near neighbor set seems slightly out of place.

Related to this: the near neighbor set were promising-looking, in the sense of potency but also late stage gametocyte assay. They are not being pursued by us at present, but perhaps we could re-discuss them again later?


1. Arylpyrrole set to be completed, with biological evaluation, by next meeting

2. We are hoping for synthetic input from the Batra and Wallis labs in that timeframe

3. Thienopyrimidine and arylpyrrole sets of commercial compounds have been sent for evaluation

4. Software people need to be engaged for improvements in website design

5. To do: chase Nislow and GSK for bio data, and Paul W to consider which series might act as back-ups if we were to park the arylpyrroles owing to their narrow SAR.

Open Source Drug Discovery: The GSK Arylpyrrole Series

Several labs are leading an open source drug discovery project for malaria, including the Todd lab at the University of Sydney, the Medicines for Malaria Venture and GSK Tres Cantos, but the project requires many other partners. The aim is to prosecute a hit-to-lead campaign starting from known actives in the GSK data set. Some background is here. The current project status is kept up to date on this wiki page.

Discussion: this site (daughter pages below)

Data: The electronic lab notebook is here.

Project status.

Updates: via a Twitter feed, and a Google+ page.


Project is open source - if you're reading this and would like to participate, you can.




A list of what's needed in the OSDD Malaria project

Main: Comment/Analysis on Initial Bioactivity Data

We have some excellent first results. What to do next?


Resynthesis of TCMDC-123812 and TCMDC-123794 (ELN)

Need: advice on oxidation of pyrrole-3-carbaldehydes - here. Through a work-around, these syntheses are complete, but we could still ue advice on the step above. Done


General Analog Synthesis Planning

Need: Advice from med chemists on what to alter first - here.


Biological Evaluation of Initial Leads

Need: Advice on what kinds of biological evaluation are most desirable to validate the initial GSK leads - here.


Where Else Can we Access This Series?

Need: People with stocks of analogous compounds (i.e. members of the arylpyrrole series) to submit those compounds for screening. First possibility seen here.

We're compiling a list on OpenWetWare of compounds we would like to source.

Desired Compounds Consultation


Request for Help

We're looking to identify a new set of compounds for the next round of optimisation. This is happening in addition to sourcing of commercially available analogues that will fill a bit more of the SAR space but aren't necessarily exactly what we want.

We've posted a list of compounds on OpenWetWare that contains a range of the things were after, along with a SMARTS filter summary.  We've had feedback and ideas submitted before here on TSL and some of those ideas have been included in this list. Below I've posted the 10 "priority" compounds from the list. It's very much open to debate so get involved. Once we've had a bit of feedback, we'll settle on a definitive list and go after them by any means necessary. The plan was to mostly concentrate on the side-chain, leaving the aryl pyrrole unit mostly untouched for the moment.


[Edit 0905 AEST 15 June 2012: Added letter identifiers for compounds to aid discussion.]

Desired Compounds Consultation Phase 2


Request for Help

The evaluation of the arylpyrroles has gone well, in that we've identified promising new antimalarial compounds. Besides their high potency, they exhibit high levels of activity in a late-stage gametocyte assay which is very exciting. (As an open source project, anyone may take these results and work on them - made easy by all our data being available.) It's for these reasons of potency that we're going to explore one more iteration of the series, despite three of the compounds showing no oral activity in mice. It's thought the problem could be low solubility. This round will only be including compounds with low (<5) logP, and we'd like to play around with the structure a little more.

If this round does not throw out any improved compounds we'll probably park the series. Hence it's important that we choose a good set of compounds to evaluate. We decided to list the top 10 "most wanted" compounds that we could access commercially, as well as a similar list of compounds we could not buy and wanted to make. We'd then attempt to source those commercial compounds, and ask the synthesis community to volunteer to make the other necessary compounds.

We're now assembling the lists. We'd compiled a first-pass list of attractive compounds. We've now modified that list to give two new lists of commercial vs. synthetic compounds - below. We now need to consult the community again on these new lists. Before embarking on synthesis or purchase we will have the compounds checked by the original authors of the GSK TCAMS set to see whether any of the compounds have been evaluated and found to be inactive - we'll send the SMILES of all these compounds to GSK and see what they say. We know that's a big ask.

First the compounds we'd like to get our hands on which are commercially-available:

If any of these are known by GSK, we'll fill up the spaces with compounds from these backups, or any others people might like to see tested:

The compounds we'd like to evaluate which are not commercially-available are:

(note that primary, secondary and tertiary terminal amides are all of interest here and ought to be made concurrently.

And again, these are the backups in case these compounds have already been evaluated:

In the synthesis set:

1) We've included a couple of pyrazoles. Fused pyrazoles quite different to the GSK hit compounds are commercially-available, and we've not included those because of the substantial differences - those options are shown here, and we could include some if needed.
2) We've taken the curveballs out as being a little speculative, but if anyone knows how to make these, or wants to have a go, please say.
3) We've de-prioritised the thiazolidinones as being too insoluble, even with some obvious tweaks. This means we have three slots available for the synthesis "top 10 most-wanted." Our final phase of consultation will be to fill those slots.

The final consultation will hopefully be a public hangout on the web for a final discussion, technology permitting. Date to be advised. This will finish the "Most Wanted" lists and begin the next phase of compound evaluation. So this is where we stand - would anyone do this differently?

On a side note, that will probably need a post of its own, the logPs in the above are approximate. We've been using available tools to calculate these, e.g. Chemdraw, but there's a lot of variability depending on the tool used. We have no access here to one that performs well, from ACDLabs. While it's likely the above figures are inaccurate (vs. truth) it's unlikely they are so far out as to invalidate a target).

Consultation Outcome



Firstly, thanks for coming to the online meeting. I found it worked quite well (minor glitches aside) but it would be great to hear your thoughts. We'll post the recording up in the near-ish future. The OpenWetWare wiki will soon be updated to reflect the outcomes of the meeting, along with SMILES. Of course if anyone is keen and beats me to it, then even better.

The discussion focused really on the selection of synthetic compounds. The list of commercial compounds (below) remained the same. The project is now looking for willing donors (ca. 5 mg) for these compounds. 


The list of synthetic compounds saw some changes. Partially because some of the original list have already been made. The replacements were discussed and found for these and the blank spaces were filled. These compounds look to mitigate the problems observed with the previous rounds of testing. Please let us know if these structures aren't what you were expecting to see here. The two compounds highlighted in blue are one's that are currently receiving attention here at Sydney. 

Final top 10 synthetic targets


The project now needs synthetic teams to investigate the other targets. Any groups looking for academic collaboration or industrial contributions would be gratefully received. It could potentially be a good way for a CRO to showcase their expertise in turning out compounds. We would be willing to provide starting material if necessary.


Analog Synthesis - Variation of the Aryl Ring

Open source is most powerful when people participate by creating. Open science is no different, and in the case of lab-based sciences, that means actually doing experiments. For the open source drug discovery for malaria project we need people to make molecules. In fact a lot of people need to make molecules. We have our first offer (November 2011).
Sanjay Batra at the Medicinal and Process Chemistry Division of the Central Drug Research Institute in Lucknow, India, has offered to ask a student to make some molecules as part of the current push to validate the GSK aryl pyrroles (thanks to Saman Habib for putting us in touch by email - Saman is going to be leading the Indian OSDD Malaria project that is starting in 2012). This opening post describes where we are, and what I think needs to be made next (though the post may change over time as the project changes).

Below are the compounds sent last week (Nov 24 2011) for biological evaluation. Included are the original TCAMS compounds, some “near neighbour” compounds, and a range of pro-drug possibilities (i.e., if the TCAMS compounds are actually prodrugs, given that that ester is unlikely to survive for long). Most were made by Paul Ylioja, and some by the undergraduate student Paul was mentoring, Laura White, who posted a nice report of what she did here. The compound codes will allow you to find the procedures in the ELNs.


According to wisdom received from the GSK Tres Cantos and MMV guys, we should be doing a broad and shallow SAR search, which is to say we ought to be picking several points of variation in the TCAMS structures and making a small number of changes in each position rather than exhaustively changing one position. The rationale there is that we need to see that the potency varies when we change things, otherwise there are a bunch of other hit series we can look at.
I think that means the best thing for Sanjay's lab to do is to finish off making variations in the aniline of the arylpyrrole synthesis – i.e. vary the fluorine:

We've done some of this – converting the F to H, Me and CF3. Not all of these have been taken all the way through to the end yet. Our undergrad student Zoe is working with Paul Ylioja to make a 3,5-CF3 variant. But I think it's important we change the position of the F, that we change the Ph ring to, for example, a pyridine, and we bulk out the ring with something like two methyls. I also think the biphenyl would be a good one to try (i.e., use 4-phenylaniline). Which compounds are made depends on which starting materials are available. I think we need 3-4 diverse anilines taken through to the end, so 8 final compounds. Whether intermediates should be saved for screening depends on whether the "prodrug" compounds sent for evaluation above look promising.

The pyrrole esters would then need to be hydrolysed, and coupled with the TCAMS R' groups according to procedures Paul has nicely worked out. Typical procedures are given as red URLs above, but generally the chemistry can be browsed at the ELN.
It would help a great deal if Sanjay was able to use the same lab book that we are using, i.e., to start an account on Labtrove and start a separate blog on this page called something like “CDRI Synthesis of Aniline Variants” where the experiments would be posted (we can create this if it's not easy/obvious). Crucially, this is an open science project, so all data must be deposited – check out the Six Laws. Our labs will be geographically separated, so we must have full access to each others' data. This also means that readers of the project can have faith in what we're doing because they can check the raw data.
I hope this project idea sounds good as a starter, Sanjay, and your students are happy!
Biological evaluation of compounds would happen either here in Australia, or better in India, if we are able to establish a willing venue for that. I suspect that will be no problem, but is not currently sorted out. The assays for the compounds made need to be similar to the ones being done elsewhere, and the same control compounds should be used. The controls should probably include one of the original TCAMS hits, and we could provide that compound if and when it's needed.

Note if you're reading this and want to take part by making some molecules, please say. You're both welcome and needed, provided you subscribe to the Six Laws. There's so much to do, we can't do it all on our own. Similarly, if you're a medicinal chemist who just can't help themselves, and think we're approaching this all wrong/right, please feel free to say why. Discussion can happen here. Project status will be most up to date on the wiki. You can tweet the project. Or you can catch up with some of us on Google+, which is a pretty useful addition to the project tools and gets us away from private email, which is generally useless for an open project.


Availability of TCMDC-123812 and TCMDC-123794

We're starting open source drug discovery for malaria. We have to start somewhere: in this case a couple of known compounds that showed good activity and have plenty of possibilities for modification - compounds contained in the open deposition of malaria data from 2010, originating from GSK's Tres Cantos lab.

Before getting too excited about these leads, we must validate them, meaning we need to obtain samples and screen. Paul Ylioja is currently making these compounds, and the chemistry is going very well, helped in part by his pyrrole wizardry. Please analyze and comment on the lab book, particularly if you're a synthetic organic chemist.

SciFinder and Google searches on the SMILES/InChIs for these structures throw up very little, but it turns out they are commercially available from a number of suppliers (Paul first spotted this). We corresponded with Felix Calderon at GSK Tres Cantos, who said that, indeed, these compounds had been bought in from the Enamine library. Tres Cantos have stock of these compounds in Madrid, and have kindly offered to look into them further if needed. Potential evaluation will be dealt with in another post elsewhere.
Given we will be wanting to modify the structures, we need to be able to synthesize them rather than buy them.

But I wonder why these compounds were made in the first place?

Biological Evaluation of Arylpyrrole Series

New Stuff:
Online lab book hosting bioactivity data is here.
First set of compounds have been evaluated (Jan 2012) - here.
Initial phase of this project is to validate the biological activity of the two Tres Cantos leads. The promise of these compounds (and others) is discussed in a paper linked here.

Original activity data for the two compounds are here and here.

For this initial phase, the question is: What kind of biological (re)evaluation is needed? (not toxicology, just activity)

For experiments, Tres Cantos (Felix Calderon) kindly offered to re-evaluate these compounds. We also have links with other labs who have expressed an interest in this project (the Eskitis Institute in Queensland or Stuart Ralph's lab in Melbourne). Question is, what data are we looking for?

In our original proposal for this project, we assumed the following assays would be needed in general. Are all these needed for validation of the current two compounds, or only later during analog evaluation?

1) A primary whole cell parasite assay covering a sensitive and resistant falciparum strain (3D7, Dd2 and W2mef). (Screening for activity would use an image based anti-malarial HTS assay incorporating DAPI or SYBR-Green dyes to monitor parasite growth: asexual and, potentially, gametocytes.

2) Assay for information on the selectivity between drug resistant and sensitive falciparum strains, as well as possible cytotoxicity on mammalian cell lines (typically HepG2 or HEK293 cells), to check for a high therapeutic ratio.

3) For compounds that inhibit growth selectively, IC50s should be determined using serial dilutions of inhibitor in 48 and 96 h assays, which will allow us to screen for promising cell-permeable inhibitors and to discern immediate and delayed parasite death – suggesting whether inhibition is of cytosolic- or apicoplast-based targets.

In our correspondence with Felix, he said the following:

1) The antimalarial activity of these compounds is not affected by the presence or absence of folate in the culture medium, implying they are not inhibitors of the folate biosynthesis pathway. Is this of general significance since it steers clear of well-established resistance mechanisms? (review)

2) The compounds are neither bc1 nor DHODH inhibitors. Why is this important?

3) Felix would be happy to determine the IC50 for these compounds in the standard hypoxanthine incorporation assay (48 h). Determination in the original Tres Cantos dataset was measured at 72 h using the LDH assays. Is this difference in assay significant/desirable?

These questions are intentionally naive, because though there are many options, we need a consensus on what people will be looking for in validation of the existing compounds, and why.

Biological Results for First Set of Compounds

The screening data from three separate labs have been obtained for the first set of compounds on the project. Data were obtained from the Ralph Lab at the University of Melbourne, and a second data set was provided just before Christmas by the Avery Lab at Griffith University. Yesterday the third set was provided by GSK Tres Cantos in Spain, who originally discovered the hits we're starting with. The current list of available compounds in this open project is here, with those that have been evaluated by at least one lab indicated in the relevant column.

Having data on the same compounds from three labs using different screening methods is useful as it provides contrasting ways of assaying effectiveness. In any given screening experiment on this project it's going to be important to include known actives, so that we have benchmarks, and this was done in these cases. It's also very important to be 100% sure about the effectiveness of a compound before we become too attached to it...

The data (below, but all available through the relevant lab book) show that the original TCAMS compounds are certainly active, though perhaps not quite as active as suggested by the original screen. Paul Willis at MMV had suggested we also check out some "near neighbors" of these compounds that were in the original data set. We made a couple and one (a novel compound with the code PMY 14-1, shown below and synthesized here) has shown promising activity in all three screens, with Avery/GSK IC50s coming back as low nanomolar. (Note that this project will never involve patents or closed data, giving us the freedom to discuss the compounds freely.)

What's next? In the short term: We're waiting for confirmation of the Melbourne data via a re-run of some of the experiments. But what we need is an expert qualitative assessment of these bioactivity data by someone familiar with such screening assays. Either in comments below this post, or on G+, not by email. First item of business in the lab is to generate a few variants of PMY 14-1. We already have some new relevant compounds and are now planning others. What should we make - i.e. how ought we to change PMY 14-1? Sanjay Batra has students who are about to make steric variations in the aryl pyrrole, and these could then be employed in the synthesis of PMY 14-1 variants, for example, but shouldn't we also be interested in changes in the "upper half" of the molecule?

In the long term: It would be good to find other labs which already have analogous compounds to the actives. Paul and Zoe found a paper from the Roberts lab at Scripps describing a number of such compounds, and I will write to them to ask whether they are interested in having the compounds be screened for their antimalarial activity. If anyone knows of any other possible sources, that would be great, since using existing compounds saves a lot of time in the lab.


Biological Results for Second Set of Compounds

In January the first biological data for compounds from the open source drug discovery for malaria project came through. The compounds were based on two hits identified in the GSK Tres Cantos set (TCMDC-123812 and TCMDC-123794). The two originals performed well, and we also identified two other compounds that looked promising (PMY 14-1 and PMY 14-3-A). Biological data were obtained from three labs (the original GSK lab, Stuart Ralph's lab in Melbourne and Vicky Avery's lab in Brisbane) and compared to known antimalarials.

Since then Paul and Zoe have been making a second set of compounds, which we shipped last month. Details of those compounds are in this spreadsheet. They are intended to explore the most promising compounds from the first set.

The first biological data are now back - from Vicky Avery's lab. We have some super-potent compounds, which is very exciting. One is picomolar (the data below are the average of two runs on 3D7). The data are posted raw here, and are summarized below (direct link to picture file).

A few obvious points:
1) We're eagerly awaiting the data from the other two labs, to see if the activity is confirmed.
2) The QSAR isn't flat - i.e. changes to the structure of the molecules make a difference to the bioactivity.
3) The aryl pyrrole appears to be needed in all sets.
4) Replacement of the ester with an amide in the original GSK compounds is seriously deleterious.

What's needed:
1) The most potent compounds have high logP. We're going to need to make them more aqueous soluble.
2) The best four from the first round and four from this second round are going to be shipped for basic metabolism assays to Sue Charman at Monash.
3) We're hoping to send 2-3 compounds for in vivo evaluation. Possibly the two originals, plus one of the super-potent compounds. Awaiting confirmation that we can do that.
4) The work that Sanjay Batra at CDRI is doing on installing sterically demanding groups on the aryl ring in place of F will be an important addition here.

1) What do we do to decrease logP?
2) There have already been some good suggestions on how to change these compounds by modification of/introduction of other heterocycles. We think this is still the way to go for round three. Everyone agree?
3) Does the lack of activity for compound ZYH 23-1, and its laughable lack of reactivity towards hydride reduction, suggest we need not worry about these compounds being PAINS?

If you've any other gut feelings about these compounds, or if you'd like to play with them in your lab, or if you spot some chemistry you'd like to do to make a related scaffold, please say.

To re-state the obvious: this is open source, meaning you can join the project, or take what we've done and use it in your own research, with attribution (CC-BY-3.0).

Biological Results for Third Set of Compounds



A summary of the biological activities obtained for the third set of compounds in 2012 - those arising from the consultation for which synthetic and commercial compounds that were most wanted. See also some links for analysis of trends in the data.

First set (Oct 19th) Data, and these were discussed briefly in an online meeting.

Second set (Nov 8th) Data, and discussion

Third set (Dec 10) Data (essentially inactive, aside from mild activity for OSM-S-103


Previous discussion of these data, highlighting trends.

Importance of primary amide side chain.

Impact of replacing ester with amides and amines. And impact on the two original GSK compounds.

Dramatic impact of methylation of the hit compound.

Low efficacy of pyrazoles.

Prodrug hypothesis II and III

Reminder of the efficacy of the near neighbour thiazolidinones.

Suggestion of next compounds, including hybrids, and the WANTED! compounds.



Late Stage Gametocyte Assay for Arylpyrroles



Four of the arylpyrroles/near neighbors have been tested in a late stage anti-gametocyte imaging assay, with interesting results.

This assay is less usual than other malarial assays (because it is technically more challenging). See this paper for a clear description of the importance of the gametocyte stage of malaria. The upshot is that drugs targeting this form of the parasite (the late stage gametocyte) are particularly valuable because they could help prevent the transmission of malaria.

In fact in the above paper many antimalarial compounds did not display activity against LSG, with only methylene blue reaching an IC50 of 12 nM. Indeed more generally it seems that there are few compounds that have been identified with this activity.

Interestingly the novel compounds screened from the arylpyrrole set (but not the original GSK compound OSM-S-5) were highly potent (nanomolar) in this assay. The data are posted in the lab notebook. That's pretty interesting.


Metabolic Studies on a Set of Arylpyrroles



Eight compounds - two GSK originals and 6 promising-looking compounds made during this project were examined by Sue Charman's lab at Monash for metabolic degradation in vivo. In human terms that means they were tested (on a simplistic level) to see whether the compounds would last long in the blood or whether they would likely be metabolised. As part of these studies the solubilities of the compounds were evaluated.

Compounds were as follows. Note the use of the new "OSM-S" notation which we're introducing to give a compound a unique ID tag, independent of source, though there will probably always be trailing synthetic-prep tags.

The results are posted here.

As is often the case low degradation rates (good) come at the cost of low solubility. We kind of expected this based on the high logP values for some of these compounds. Presumably analogous compounds could be re-examined when more soluble.

hERG Assay for Two Potent OSM-S Compounds



One of the most potent compounds identified to date in the OSDD malaria project, the near-neighbour analog OSM-S-35 (ZYH3) was subjected to the hERG assay along with one of the original TCAMS GSK compounds which in this project has the tag OSM-S-5.

Raw data are here, plus spreadsheet here.

These results suggest that these compounds are "misses" in this assay, implying that they, and perhaps the series as a whole, would not have cardiac side effects as drugs.

Courtesy of Paul Willis at MMV: The human ether-a-go-go related gene (hERG) encodes a potassium channel in the heart (IKr) which is involved in cardiac repolarisation. Inhibition of the hERG channel can cause ‘QT interval prolongation’ resulting in a potentially fatal ventricular tachyarrhythmia called Torsade de Pointes. A number of drugs have been withdrawn from either the market or from late stage clinical trials due to these cardiotoxic effects, therefore it is important to identify hERG inhibitors early in drug discovery. Therefore the fact that these compounds are inactive at hERG is good news (Much is understood about the pharmacophores that hit the hERG channel so I was not expecting an issue for these compounds but it is always good to confirm).  A hERG inhibition at an early stage is not a show stopper but a clear issue that has to be addressed in the optimization process.




Druggability of the Arylpyrrole leads (TCMDC-123794 etc)

Project is starting with a couple of leads from the GSK Tres Cantos set. There is a newly-published analysis of the druggability of the compounds in the original data set. The arylpyrrole series is listed as one of the most promising (though the Aryl-F is missing in the published paper - presume that is a clerical error).
Tres Cantos are appealing for collaborators to work with them on these compounds, which is an excellent idea. That's what we're doing, except that the project hosted here is open source, meaning anyone can see what we're doing and guide the direction of the project.
The initial phase is the resynthesis of these leads and their validation. We will soon be moving to analog synthesis. The obvious first question for the community of medicinal chemists is: what should we change?
My gut feeling was to verify the need for the aryl-F and the methyls on the pyrrole. Paul Willis' gut feeling was that ester. Gut feelings and half-formed thoughts enormously welcome as comments below.

Known Near Neighbours of Initial Tres Cantos Leads

We're starting with the resynthesis of two leads from the GSK Tres Cantos dataset. The obvious question is: are there other, related structures in the dataset that might give us information on what to change next?


Paul Willis from MMV did a quick "near neighbour" search (25 Aug 2011), particularly with an eye to getting rid of the ester in the lead structures. Structures below. As he said: "The first compound is a ketone analogue of the ester lineage – it's a singleton and not an ideal group from a drug discovery perspective but indicates other groups may be tolerated.  The next set is the entire cluster output of another near neighbor I spotted – all have replacements for the ester group (again not especially drug like but possible indication that wide variation possible at this position) and interestingly some contain variations on the 4-F-Ph"






1) Are there other structures that are "similar" in the GSK set - searchable at Chembl.

2) What do these structures tell us about what to change next in the lead compounds?

3) Importantly, can we gain access to analogous structures that have been evaluated but not reported?

Interestingly this includes compounds that are related to the TC hits above but which were perhaps assayed against other targets.

Data for the above compounds will be posted to the wiki page. Discussion of them can more easily happen in comments below.

Synthesis Strategy for Near Neighbours

The evaluation of a synthesis strategy toward "near neighbours" of the TCMDC-123812 and -123794 is underway. Current efforts summarised as shown below:
Near Neighbour Synthesis
The experimental details are outlined in experiments PMY 13-1, PMY 14-1 and PMY 16-1. The synthesis appears straightforward but the final hydrolysis step resulted in material that was contaminated by grease (presumably from lab glassware). I'll update when I've repeated the reaction, while avoiding introduction of grease!

Prediction of Biological Targets of Actives


Request for Help

One of the interesting features of the GSK set of antimalarial compounds that are acting as the starting point for this project is that they are whole-cell actives, meaning that though they are extremely promising hits, we don't know how they work - i.e. what the targets are. To some extent this doesn't matter - praziquantel has been used for over 30 years and nobody knows how it works. However, a combination of factors (ease of regulatory approval, the possibility of some rational drug design, sheer curiosity) means it would be nice to know what these antimalarials are actually doing. How to figure that out?

There are ways. One is to use predictive cheminformatics - to use a correlation of all the known drug vs. drug target matches that are known, and to extrapolate that model to a molecule of interest. This exact part of the open malaria project was in our original grant proposal as something with which the core team had no expertise and so was an area where we were going to have to appeal for help. One of the super nice extra features that such an approach can bring is to predict off-target effects, which can help make a drug more effective (for example in this tremendous paper).

Last week such help arrived. I was talking with John Overington and Iain Wallace from ChEMBL about uploading the data from our project to their database (about which more shortly). It was an extremely interesting conversation. Iain has an interest in target prediction. He'd already taken the most active compound from our last round of biological evaluation and run it through his system to predict the likely biological targets of the drug. The raw data are here. The outcome from this search were these possible targets:
1. Carboxy-terminal domain RNA polymerase II polypeptide A small phosphatase 1
2. Dihydroorotate dehydrogenase (DHODH) - MMV/GSK have run these assays, e.g. here.
3. SUMO-activating enzyme subunit 2
4. SUMO-activating enzyme subunit 1
5. Cyclin-dependent kinase 1

What can we do with this information? We can try to find someone willing to screen this compound against those targets directly, to see if they are really targets. Anyone running these assays?

Iain's method is described in the online lab book, but he says it's this in essence: "Basically, a naive bayes model is built to distinguish compounds that are known to bind a particular target in ChEMBL from all others. We repeat this procedure for ~1,300 targets creating a model for each and score a compound with each model. I then generate the reports for only malaria proteins."

Iain has also repeated this analysis for the entire malaria box. This is significantly awesome work. Does this, I wonder, change the perception of which series are of particular interest?

It's important to bear in mind that these are preliminary results, as with everything in an open source project, and should be taken as work in progress. Iain understands this and wants to make sure everyone else does. Iain also points out that similar approaches have been used to successfully to identify novel targets of FDA compounds (see here and here), and the Shoichet lab have a nice webserver that can used interactively.

The other way of doing target prediction is experimental. Iain mentioned a couple of guys that might be perfect for this - Corey Nislow who runs a yeast-based assay for target ID, and Andrew Emili who is developing a proteomic-based assay. They're both at the University of Toronto, along with Gary Bader, whom Iain also suggested we contact. I'll reach out to see if they're interested. Any advice on the best approach gratefully received - chances of success here? Favoured method for target ID?

Bioisosteric transformation maps


Request for Help

It is common in drug discovery to have a highly potent hit that has to be optimised to remove undesirable characteristics such as poor oral bioavailablity, metabolic stability or toxicity. In our case, we have a number of highly potent compounds that have quite a high LogP, which is considered a warning sign for both a promiscousity (i.e. binding to many compound targets in vitro) as well as poor oral bioavailabilty (as it breaks the Lipinski Rule of 5).
One approach to solving this issue is the concept of bioisosterism. From Wikipedia, "bioisosteres are substituents or groups with similar physical or chemical properties which produce broadly similar biological properties to a chemical compound. In drug design, the purpose of exchanging one bioisostere for another is to enhance the desired biological or physical properties of a compound without making significant changes in chemical structure."
One such example would be replacing a hydrogen with a flourine at a site of metabolic oxidation. "Because the fluorine atom is similar in size to the hydrogen atom the overall topology of the molecule is not significantly affected, leaving the desired biological activity unaffected. However, with a blocked pathway for metabolism, the drug candidate may have a longer half-life."
To that end I have created biosteric transformations using the Pipeline Pilot software programme for the compounds synthesized as part of this project. Two different approaches are implemented to generate the transformations:
1) Classic Biosteres involve transforming the original molecule based on a set of ~200 commonly used transfomrations, such as replacing a hydroxyl with a sufonamide. 
2) Database Biosteres involve transformations based on an algorithm described in this paper
"M. Wagener, J.P.M. Lommerse, “The Quest for Bioisomeric Replacements”, J. Chem. Info. Modeling, 2006, 46(2), 677- 685"  
Focusing on just ZYH-3-1, I generated two reports (one for both methods) showing about 20 compounds resulting from such transformations that would all have ALogP < 5. It would be intereting to know how easy they would be to synthesize, as well as what would make sense to make based on what we know about the SAR of these compounds.
All the reports and data are available

Compound Similarity Networks from Iain



More fantastic work from Iain Wallace of ChEMBL, below. These are maps of active antimalarials and predicted targets, expressed as similarity maps, i.e. with an extra level of analysis added on top. This provides a very intuitive way of walking through related compounds to compare structures. How best do we use this kind of analysis - as a target guide, or as a "prediction of what to make next" guide?

Iain says:
I have now predicted targets for all the anti-malarial active compounds in Chembl-NTD (~20,000).  I have a full report for all these compounds, but it is quite large (~90mbs and 1200 pages) so I have displayed the results as a compound similarity network (posted here). In this network, compounds are represented as nodes and very similar compounds are connected by edges. Nodes are coloured by their predicted top scoring target. The names of the compounds can be viewed by zooming in very far and opening the file in Illustrator.
A similar map for compounds similar to zyh-72 [and one of the starting "Near Neighbour" set TCMDC 123563] in this dataset is posted here.
Also [on the two pages linked above] are two original networks that were generated using cytoscape ( If you install the cytoscape plugin "ChemViz", (, you can right click a node and view the compound structure. You can also view the target of the compound.
I think networks are useful way of visualizing/integrating different types of information (for example and I would be interested to hear if you had any thoughts of how to make this type of visualization more useful. For example the similarity measure I am using may not be finding molecules that you would expect to find.

Purchasable compound similarity maps


Request for Help

There are over 5 million compounds that are available to purchase according to the meta service, E-molecules (
It is worth exploring these in the context of the OSDD project as it will identify compound series' that are very easy to explore by purchasing analogues (i.e SAR by catalogue) aswell as identifying compounds that are potentially more sythetically accessible than others (i.e. if there many close neighbours these compounds might be easier to make than others). 
To this end I have generate three chemical similarity maps, showing compounds that are very similar to known anti-malarial's that can be purchased.

Maps generated

1) Centered on ~40 compounds synthesized by OSDD:
This is particularly interesting, as there are many compounds similar to ZYH 3-1 that could be purchased exploring both ends of the molecule.
2) Centered on ~550 compounds from GSK priortized into ~40 compounds series
This is interesting as it clearly identifies high priority series that are easier to follow up on than others based on the premise that if there are many close analogs available, it should be cheaper/easier to follow up on these clusters compared to clusters with no close analogs available.For example, the paper highlights five clusters for follow on work. #31 has the most similar available compounds  (~100 compounds), while #18 has the least, 1, which would suggest to me that it is much more efficient to follow up on the compounds in cluster #31 than any of the other high priority clusters. There are other very interesting clusters too, which perhaps have more interesting chemistry.
3)  Centered on ~400 compounds in the Malaria open box
Likewise, certain compounds in the Malria open box have many more close neighbours than others which could help prioritize compounds for the community.

It would be great to get feedback on this approach, namely:

1) Does this type of visualization work for chemists? Is it straightforward to download cytoscape, install ChemViz and load the network?
2) What do we know about the SAR about these compounds? This would help to priortize/focus our search on chemical space. While I used ECFP_4 fingerprints, other similarity measures can prioritize other features differently (e.g. if we know a paticular substructure is key, then all compounds should contain it etc)
3) While I think the network view is great for a global overview of the compounds available (and can be overlayed with any other types of data that we can thing of, such as predicted targets etc), perhaps there is a better visualization for a smaller number of compounds?

Proposed resynthesis strategy for TCMDC-123812 and TCMDC-123794

Below is a proposed synthesis strategy to the two members of the TCMDC aryl pyrrole series. Please comment, give your thoughts and improvements.
Experimental attempts are documented on an electonic lab notebook found here:
Proposed synthesis strategy

THE SUBCELLULAR DRUG TRANSPORT LABORATORY: Cellular pharmacokinetic analysis of antimalarials

THE SUBCELLULAR DRUG TRANSPORT LABORATORY (click hereto visit our lab website)


OurResearch Group studies the microscopic transport properties of small drug-like molecules inside cells. As an overarching hypothesis, we propose that a drug's microscopic distribution within cellular organelles is a major determinant of drug efficacy and toxicity, as important as its macroscopic distribution in the organs of the body. Experimentally, we use high throughput microscopic imaging instruments to capture the local distribution and dynamics of small molecules inside cells. For image data analysis, we are developing innovative computational tools and statistical strategies, combining cheminformatics and machine vision to relate the chemical structure of small molecules of varying chemical structures to their subcellular distribution. We are also developing biochemical analysis methods to study the microdistribution and cellular pharmacokinetics of small drug-like molecules. Lastly, with the information gained through experiments, we build mathematical models that are used to simulate drug transport and distribution in single cells and higher order cellular organizations, based on biophysical principles governing molecular transport phenomena at the cellular level.



We envision a day when drugs will be designed, optimized and ultimately approved for clinical use in terms of their site of action, as much as drugs today are designed, optimized and approved based on their molecular mechanism of action. Complementary to in vivo and in vitro models used in drug discovery today, in silico models (such as cell-based molecular transport simulations we use in our experiments) can be applied to pharmaceutical discovery and development. Indeed, computer simulations of drug distribution in biological systems remain largely unexplored as a tool for screening drug candidates. Nevertheless, computers are becoming increasingly fast, reliable and inexpensive research tools. For drug design, we are exploring cell-based molecular transport simulations as a way to probe the role of microscopic drug transport as a determinant of drug. absorption, distribution, metabolism and excretion. Within virtual environments, cell-based molecular transport simulations make it possible to observe and manipulate the distribution of large numbers of drug candidates inside cells, in a manner that is practically impossible to perform experimentally. Weare already exploring how cell-based molecular transport simulations can be used, for example, to analyze the most desirable physicochemical features of molecules targeting extracellular domains of cell surface receptors, imparting maximal tissue penetration while minimizing intracellular accumulation in non-target sites. Furthermore, by making modeling and simulation toolsavailable for free and disseminating them via the internet, our ultimate aim is to help educate the next generation of pharmaceutical scientists and medicinal chemists throughout the world, as much as it is to facilitate the practical development of drugs against diseases neglected by the pharmaceutical industry, such as parasitic infections.



We were introduced toThe Synaptic Leap by Rajarshi Guha from Indiana University and Jean-Claude Bradley fromDrexel University, with whom we will startto collaborate in the development of falcipain-2 inhibitors. Effectivelyour goalwill beto become part of this Open Science project, so that others can learnto use the computational tools we are developing,and help us develop new computationaltools.As part of this open science project, we realize thatscientific progress often relies on making many mistakes before achieving some success. Accordingly, all the results we post should be consideredtentativeor preliminary.Nevertheless, as weproceed with ourwork, wewill be ableto provide the antimalarial drug development communitywith crucial guidance in the pharmaceutical sciences, that should facilitate selection of antimalarial drug candidateswith optimal pharmaceutical properties, for clinical development.For example, we will be performing computational analysis ofthe absorption, distribution, metabolism, excretion properties of thealibrary ofcandidate antimalarial agents under development,using cell-based molecular transport simulations to analyzethe intracellular distribution of small drug-like molecules in the target cells (the malaria parasite), as well as off-target cells (the cells of the humanbody). One goal will beto identifya subset ofmolecules that accumulate maximally in the subcellular compartment in which the drug target is localized --in the case of falcipain-2, the parasite's lysosomes.Anothergoal will be toidentify those molecules that have the highest transcellular permeability in intestinal epithelial cells, so that they can be administered via the oral route.Yet another goal will beto identify those molecules that show lowest intracellular accumulation in off-target cells, which should minimize metabolism and off-target toxicity, while maximizing the concentration of drug in the blood. As we proceed with our analysis,we willintegrate our results with results from our collaborators (ie. docking studies, biochemical screening assays, parasite cytotoxicity assays, and other bioaassays)to assist in the prioritization ofantimalarial drug candidates for advancement into clinical trials. Beyond falcipain-2 inhibitors, lysosomes are a key subcellular target of antimalarial drugs of widespread clinical use, such as chloroquine. Therefore, theincreased understanding weobtain from thisresearch project should be broadly applicable to the development of future generations of lysosome-targeted antimalarial agents.

A semi-quantitative cell-based phenotypic assay for determining the lysosomal accumulation of small molecule antimalarials

One of the reasons why scientists in the past have not been able to develop small molecule drugs targeted to subcellular organelles is the lack of suitable assays to monitor the absolute concentration of small molecules within those organelles. Although measuring the absolute concentration of small molecules within organelles is still very difficult experimentally, the availability of large chemical libraries of small molecules have permitted the use of semi-quantitative (or qualitative) experimental readouts, to test the validity of mathematical models predicting subcellular distribution of small molecules. In this context, Nan Zheng, a second year PhD graduate student that recently joined the lab, is interested in developing a bioassay to report the accumulation of small molecules in lysosomes. Such an assay would allow validating the results of cell-based molecular transport simulations as they pertain to the lysosomal accumulation of small drug-like molecules, such as falcipain-2 inhibitors being developed as antimalarial agents (falcipain-2 is a resident lysosomal enzyme).

In the past, it had been observed that small lysosomotropic molecules (such as the antimalarial drug chloroquine) induce a vacuolation phenotype in cells that are incubated with drug. This vacuolation phenotype is the result of the osmotic swelling and expansion of the lysosomal compartment, as a result of the accumulation of the drug molecules in the organelle. Back in 1974, Charles DeDuve described this phenomenom, and attributed it to an ion trapping mechanism. According to this mechanism, the lumen of the lysosome is acidic (pH 5) in relation to the cytosol (pH 7.4). Weakly basic molecules (such as molecules containing amine functionality) exist mostly in neutral form in the cytosol, while in the lysosomes they exist mostly in protonated, charged form. Because the protonated charged form of the molecule is largely membrane impermeant while the neutral form is membrane permeant, the pH gradient across the lysosomal membrane results in a chemical potential that drives the accumulation of the weakly basic molecule in the lumen of the lysosome.

While one may expect that most weakly basic molecules would tend to accumulate in lysosomes, one would also expect them to do so to different extents based on differences in the ionization constants and membrane permeability of the different ionic species. But, perhaps most importantly, pH gradient across the lysosomal membrane is a generated by the lysosomal H+ATPase, a protein that uses ATP hydrolysis to pump protons from the cytosol into the lysosomal lumen. Thus, the lysosomal accumulation of small molecules should also be highly sensitive to the metabolic status of the cell. If a small molecule accumulates in mitochondria or other organelles in a non-specific fashion (in addition to accumulating in lysosomes), it will compromise the metabolic status of the cell in a manner that decreases the production of ATP, ultimately leading to the dissipation of the lysosomal pH gradient. In this manner, only those molecules that selectively accumulate in lysosomes “while not accumulating in other organelles- should be able to induce lysosomal vacuolation phenotype. Accordingly, the goal of Nans project will be to fine-tune our molecular transport simulators so as to be able to rank molecules in terms of their ability to induce lysosomal swelling, considering not only the ion trapping mechanism, but also the selectivity of the molecules for accumulating in lysosomes vs. other organelles.


Development of Falcipain-2 Inhibitors as Antimalarial Agents

The development of falcipain-2 inhibitors as antimalarial agents is a collaborative Open Science project that we are joining. Our research group studies the development of small molecule chemotherapeutic agents that are targeted to specific sites-of-action, at a microscopic level. Falcipain-2 is a lysosomal enzyme of Plasmodium, the parasite that causes malaria. Therefore, the ability to identify small molecule falcipain-2 inhibitors that accumulate in the lysosomes of the parasite while not accumulating in other parts of the human body is key if the chemotherapeutic agents under development are going to have potent antimalarial activity in vivo, with minimal side effects.

Remarkably, the design of small molecule drugs targeted to microscopic sites of action (ie. specific organelles) within cells is in its infancy. Although the importance of the site of action in the context of drug design and development should be obvious, current rational design strategies used in the pharmaceutical industry focus on optimizing a drug's mechanism of action (ie. the binding of a drug to its specific molecular target and its inhibitory activity), because little is known about how to direct small molecule to the site of action. In our research group, we are developing cell-based molecular transport simulations, which allow prediction of how the chemical structure of small molecules lead to differential accumulation in various organelles of the cell. Therefore, one of the goals of this project is to combine results of such simulation, with data about target binding and target inhibitory activity, to determine which small molecule drug candidate is most likely to be effective and non-toxic in a cellular (and ultimately and organismic) context.

The student in charge of this project will be Jason Baik, a second year PhD student in the department of pharmaceutical sciences at the University of Michigan College of Pharmacy. As part of this project, Jason will be keeping a blog instead of the usual laboratory notebook.

Mathematical Modeling of Cellular Pharmacokinetics: Cell-Based Molecular Transport Simulators

Every single protein encoded bya genome -human or otherwise- is localized to some microccopic subcellular compartment or organelle. In the case of the malaria parasite, the parasite generally inhibits the red blood cells in theciruclation. Within the red blood cells, the parasite thrives by feeding on hemoglobin, which constitutesthe bulk of theprotein mass of the red blood cell.Todigest thehemoglobin, the parasite relies on lysosomes, a digestive organelle that contains proteases that chop thelarge hemoglobin molecules into smaller peptides and ultimately amno acids that can beincorporated into the parasite's own metabolismto help it grow, reproduce and infect other cells. So, how does one design a small molecule that is absorbed by the body, enters the blood without being metabolized, ultimately accumulating in the parasites lysosome without accumulating in other parts of the body?

To enable design of such a drug, one of the graduate students in my lab -Xinyuan Zhang- has developed a new type of computational tool referred to as a cell based molecular transport simulator. Her work was recently published in a peer-reviewed research journal:

Xinyuan Zhang, Kerby Shedden, Gus Rosania. (2006). A cell-based molecular transport simulator for pharmacokinetic prediction and cheminformatic exploration. Molecular Pharmaceutics; 3(6) pp 704 - 716.

For orally administered drug products, dissolution of drug molecules in the gastrointestinal tract followed by transport across the epithelial cell monolayer lining the lumen of the intestine can exert a major influence on systemic drug concentration and activity. This research project will involve building computational models to simulate biochemical reactions and diffusion of small drug-like molecules inside and across intestinal epithelial cells. These cells form the barrier between the lumen of the intestine and the inside of the human body. Acting as a gateway, intestinal epithelial cells exert a major influence on drug absorption into the body, and are a key determinant of drug concentration in the blood.

For mathematical modeling, passive and active transport of drug molecules can be described in fundamental biophysical terms, using several well-known differential equations. For example, Ficks equation and Nernst-Planck equation can be used to calculate the rate of transmembrane drug transport based on the pKa of functional groups on the drug molecule, and the octanol:water partition coefficients of the different ionic species that the drug molecule can exist in, as a function of the local pH microenvironment. Michaelis-Mentens equation can be used to capture the effect of transmembrane transport proteins (such as P-glycoprotein) as well as the effect of drug metabolizing enzymes, on local drug concentrations and distributions within any given subcellular compartment.

To model drug transport within and across cells, we consider the individual subcellular compartments that are delimited by membranes (ie. apical and basolateral compartment; cytosolic compartment; mitochondrial compartment; lysosomal compartments; etc). Each compartment has a characteristic pH, and the membrane delimiting each compartment possesses its own transmembrane electrical potential. Accordingly, we will use coupled sets of the aforementioned differential equations to describe how the different subcellular compartments selectively accumulate different drug concentrations through time, as well as the rate at which drug molecules are transported across cells (the drugs transcellular permeability) “all in the presence of a transcellular drug concentration gradient to mimic intestinal absorption. To test and improve the model, the computational resultsare beingrelated to published experimental measurements, as well as measurementswe performin the lab.

Complementary to in vivo and in vitro models used in drug discovery today, the cell-based molecular transport simulations we aim to develop are promising new tools to facilitate pharmaceutical discovery and development. Nevertheless, computational models are inexpensive, flexible and scalable, and they can be continually improved upon by future generation of scientists. Indeed, with the aid of computational simulations of drug transport such as the one we are developing, we expect that one day, drugs may be designed, optimized and ultimately approved for clinical use computationally - in terms of their site of action- as much as the drugs today are designed, optimized and approved experimentally, in terms of their mechanism of action.



The GSK Aminothienopyrimidine (ATP) Series



The second strand of the open source drug discovery for malaria project involves aminothienopyrimidines (ATPs). The wiki summary page contains the current To Do list. Child pages to this one refer to specific aspects of the project.

Biological Data



Chid pages to this one focus on biological evaluation in the GSK ATP project - results and the response to the results.


Biological Results for ATP Series from April 2013



Comments on Latest Biological Results - April 2013

Data may be viewed here.
General page for the whole aminothienopyrmidine series is here.

The content of this post is also viewable on a Google+ post.

The results of the first major round of synthesis and evaluation for the aminothienopyrimidines are in, following a preliminary round that validated the GSK hit, as well as an exploration of the commercial space around the hit. The design of the latest round was carried out with the following principles in mind:

1) Creating a set of structures with considerable diversity, given that the hit, OSM-S-106, arose from a phenotypic screen so has no known target/mechanism of action.
2) Maintaining low calculated logP values, and being mindful of molecular weight.
3) Exploring whether the sulfonamide (and its meta position) is necessary. This was to some extent guided by which boronic acids/esters could be bought/made.
4) Exploring whether the amine on the pyrimidine could be substituted. A morpholine at this position was seen as being a sensible variation because of this modeling analysis. The longer side chain employed in e.g. OSM-S-137 was proposed because of mild activity for TCMDC 132385 (comparison of structures).
5) Given the quality of the hit OSM-S-106, this compound has been resynthesised and we will approach Sue Charman's lab for some solubility and basic metabolic stability assays. There is always the chance that the first hit will be the best compound.

The data show that essentially all changes did not just lead to a reduction in potency, but obliteration of activity. This is surprising. In several cases there appeared to be some sort of solubility issues, though we had expected all these compounds to be highly soluble.

It would be useful to hear the thoughts of medicinal chemists reading this, in terms of what these data suggest. It would seem as though the meta sulfonamide is very necessary for some activity, and that the morpholine is not a good substituent to pursue. Results for OSM-S-137 vs TCMDC 132385 (structures here) suggest that variation of amine is possible where there is some variation (i.e. removal) in the sulfonamide portion and this raises the possibility that we might be looking at two targets - one hit by the substituted pyrimidine and one by the aryl sulfonamide.

The project needs suggestions about what to make in the next round. Please comment below, either with your name or anonymously. The gut feelings of people with some experience of this stage of drug discovery would be very valuable.

In addition, if you happen to have in stock some thienopyrimidines that you feel might be worth evaluating, and you'd be willing to send to a local lab for biological evaluation, please say so below. Accessing relevant compounds is something the project is not doing enough of, partly because we don't yet have a Molecular Craigslist.

Short term plans:

1. Morpholine group appears to be a bad substitution. Make other substitutions on the primary amine, such as one or two methyls (dimethyl InChIKey CGTVWFYTOADSRO-UHFFFAOYSA-N). Route to that would likely be to take OSM-S-70, add dimethylamine, then brominate, then Suzuki. Action item for Althea.
2. How many more meta-substituted sulfonamides like OSM-S-106 exist in the literature? We've done this search before (link?), but let's revisit. Action item for Alice.
3. We already ordered several other meta-substituted boronic acids (link?), some of which have now come in. We need to evaluate these. Action item on Althea.
4. Are there any doubly-substituted meta-boronic acids we could employ in the coupling? Action item on Alice.
5. TCMDC-132385 should be synthesised to check its activity, as well as the fragments without the fluoroaromatic, and with other aryls. Action item on Murray.
6. Ask GSK for any information they might have on known inactives around OSM-S-106 where there is variation in that pendant amine side chain. Action item on Mat.
7. Make OSM-S-106 with substitution of the amine with a methyl ether (InChIKey RMRWXSBDYMDPPE-UHFFFAOYSA-N). Devise synthetic approach. Action item on Alice.
8. Approach Sue Charman's lab for solubility and metabolic assays on OSM-S-106. Action item on Mat.

Longer term plans are:

1. Synthesise des-amino OSM-S-106 (NAPAWLMTRGMNSY-UHFFFAOYSA-N), i.e. substitute the amine for H on the pyrimidine. Who?
2. Can OSM-S-106 be derivatised further, i.e. this compound as a starting point? Who?
3. Can Murray's new bromo-regioisomer synthesis be used to make more diverse compounds. How risky is this in terms of the amount of methodology needed? Who?
4. How easily can scaffold analogs of OSM-S-106 be accessed, e.g. furan (GSJDJXHDXORRDD-UHFFFAOYSA-N), flipped thiophene (NBFDNDWQCGJNHN-UHFFFAOYSA-N), pyrrole (XTMBLMGMAYOFBP-UHFFFAOYSA-N), indole (ZVHBDXVFFFLUQY-UHFFFAOYSA-N), pyridine vs pyrimidine (FCEYCXXGTFVWIS-UHFFFAOYSA-N)? Any takers for synthesis?