Design Features for Open Source Drug Discovery

Published by smaurer on 15 December 2005 - 10:42pm

Dear All:

I just went on the site and it looks really, really super.  Congratulations to all, and especially Synaptic Leap.  I know that people are writing software for the pilot projects and until I see them I probably shouldn't kibbitz.  But it's irresistable. 

1) I know that Gene Poll isn't done yet, but how we design the voting rules will govern how the community interacts and develops new insights.  let me make some suggestions anyhow. 


a) First of all, voters need to know their choices.  That means a ballot (web page) where everyone who logs on can see a complete list of candidate genes, learn what the gene is and why it might be useful, learn where to go on the site if you want to learn more, and add their own write-ins.  Also, how will the initial list of "candidates" be submitted.  Will there be just a few?  I know a biologist who has about 50 public domain hits, should she submit all of them?


b) Second, the different genes should compete to see who's the best.  The starting point is obviously the Sali group's gene cards -- analogous to the kernel that gets written by a single guy before open source collaboration begins.  That provides a starting point for volunteers to form reasoned opinions about which genes ought to win/lose.


c) True open source collaboration will begin when people do investigations on their own and use the results to modify the initial gene card.  Good arguments would ideally include references and links to evidence.  There could also be a page where people could suggest what database searches ought to be done next.


c) An extension of the foregoing would be for genes to have a recognized advocate who marshals the evidence for believing that a particular gene is a good hit and also perhaps a "devil's advocate" who marshals the evidence against.  In practice, both roles could be in the hands of a single curator, which would more or less follow standard open source practice and also database practice in the physical sciences. 


2) I think that the gene card should include the legal status of genes. Is it patented or not?  This is something law students can get involved in.  Volunteers might decide that they would rather work on an unpatented gene - We should empower that decision.

3) I think that TSL's expertise categories should include "Law/Social Science" as a choice, at least if you think that contract writing/patent searches/transactions are relevant to making drug discovery work. 

4) Voting rules.  Since this is science rather than politics, voters should have no trouble changing their vote before whatever date is set for the "polls to close."  Also, do we close the polls early if there's a runaway winner?  Do we never close the polls if the winner doesn't cross some threshold?  I would like everyone to be be able to vote 60% to their favorite candidate, 30% to their next favorite, 5% to the next runner up, etc. etc.  Also maybe a "Discard" vote for genes that should be "voted off the island" as no longer worth considering.  Realistically, if first place is only 5% ahead of second place, we shouldn't discard that information.  There would have to be some kind of rule to reduce everyone's first, second, and third votes to a rank ordering, but so be it. 

5) Voter restrictions.  I assume at first everyone who gets on the site will get to vote, so that only automated spam kinds of votes get excluded.  In the long run, we should consider some lowest rung of recognized membership -- similar to the people who are officially ranked smart enough to send bug reports to Apache.  This is something that biologists could legitimately put on their resumes.


6) People who contribute good ideas or lots of labor ought to be identifiable somehow.  Perhaps by naming official curators, perhaps by searches that isolate their contributions.  If they work for Pharma, their employer should also get credit.


OK, I know it's easier to ask questions than provide answers.  But hopefully some of this is helpful, design details matter if we want to attract lots and lots of volunteers.




Q&A and Feedback
gtaylor's picture

RE 1 & 4) I know that marc is loading the entire genome and making that available.  People are to also provide comments and links to references justifying why they're voting for a particular target.  

What I'd like to see (and am less sure of how Marc sees it taking place) is a specific time-frame for that first set of input and then a second round of votes where people can vote on the candidates evaluating the information people specified as their justification for recommendation.   Again, a specific timeframe to collect the votes and then to have the structural and compuational analysis performed on the community driven target.

I think that we want people to continue to provide input and rankings.  And I agree people should be able to change their minds.  But I don't want to lose the comments and reference links they added - unless those were mistakes.  So really the point is to allow people to have multiple votes.  I'm hopeful that the "gene basket" will be a way for a person to keep track of all genes s/he has voted on with his/her comments.  It should sort based on the person's ranking of those genes so that the person will be motivated to keep the ranking correct. 

Who knows, perhaps later somebody else may decide to select a gene for study based on community rankings and discussions. 

RE 5) Voter restrictions.  Hmm well I have no business voting and therefore won't.  Hopefully others will exercise the same good judgement in the name of science.  If not, the good news is that with poor justification, high rankings are likely to be isolated (e.g. one person voting for it) At this juncture we have no processes to keep spam registrants off the site.  However, it's not likely that automatic agents will vote on genes.  They're much more likely to create blog postings that reference their site. 

Fortunately, Drupal, the software driving this site, does offer an optional module called captcha that we plan to implement later.  That should minimize the spamming activity.

RE 3) The law/social sciences selection option is an available expertise for a person's profile.  I completely believe we'll need IP and economics experts to ensure that we do things properly and such that a virtual-pharma e.g. IOWH can carry the research forward into clinical trials. 

RE 6) I completely agree that contributors need recognition.  Our members module is our first feature for that.  There are several other optional modules available with Drupal that I have on my list to investigate later.  At the moment, we still need to get the usability working on the features we have - e.g. we need email subscriptions, the formatting of this post wasn't the easiest - I had to use the "rich text editor" and the Full HTML to make it look good.  BTW, I hit those same buttons for your posting too - you obviously had written it to format well.  But the software wasn't helping you out much.  It's on our list...

Marc, I hope you respond to this and clarify/expand on the gene poll aspects.  I don't want to misrepresent your project.