Storing structures and views for search later

Published by alex_a on 3 August 2006 - 2:04am

Not sure where this should go - I have v. limited experience here and it is not in my profile, so I say this and nip off. Anyway - hats off for a fine concept and the best of luck

The idea I want to pass on here is one I try on commercial research but they are cautious due to deniability and litigation.

I propose that you set up mechanisms to recieve actual structural files when you discuss  chemical or biological structure related questions. The point being that you can forever associate the comments or threads or blog entries (whatever) with that structure file. In this way you can later perform all manner of searches and comparisons with the stored structural data and also serve up human comments related to those structure entry appearances. Actually what you are doing is storing human information which is difficult to search massively (so far) with a well constrained concept which can very easily be power searched (structure graphs). Structural files in information speak are very highly specific and even with very large collections of files the power of the query launguage (structure query) can allow you to produce query results with very little junk - at least on the structural side (human related comments can of course be junk). Downstream - with more data there are manifold possibilities to mine the structure files to generate knowledge and if all goes well the power of the text handling side shows much promise to multiply the value of the human comments

On the discussion enhancing side I am sure having, at least, images of structures and possibly 3D binding pocket views would be cool - especially if you can save the view and publish it (say as a gif) and attach it to your posts/blog to support your comments, of course as a structure file opponents are at liberty to open it, re-evaluate and suggest alternatives - especially if they can open the file from your wewbsite and be looking exactly at the view and trimmings. If we factor virtual discovery surfaces in here then we could multiply the power of the forum massively

Where is this going, well the sell really. Our company makes nice functional java based cheminformatics toolkits that are already being used in chemical / drug discovery/development forum environments. I know we would support use in such non commercial environments for free and I think that capturing human experience in a machine searchable way will save all those clever comments to be accessed in new ways. I think the technology is relevant and ready and open minded organisation should be gathering data for future research and researchers

Not sure how much sense this will make out there but I hope this has ping a few semantic heads.

Cheers 

Alex 

Forums 

Q&A and Feedback

Subject 

Miscellaneous

Community 

general open research
gtaylor's picture

Hi Alex,

Thanks for the post. Great thoughts in there.

Having been in charge of the PeopleSoft portal, I have signficant experience with the concepts of blending structured and unstructured content. We introduced the Intelligent Context Manager® in 2002 that created configurable mashup-like pages that could blend related data together whether it was structured or unstructured. And I completely agree there's a lot of value in it. Most bioinformatics and cheminformatics tools at this point are muchning the structured data. We need to add the human element.

I've been working with Marc A. Marti-Renom on a concept to do this for genomics-based target identification. We're shaping the concepts now and will be applying for a grant to make it happen. Marc discusses his ideas on this google talk.

What company are you with?  And will you allow free access to your data and tools for TSL?

And from your post it looks to be more chemical based than genomic based. Am I reading that correctly? 

Cheers! 

HI yeah, I am a biologist but working (non science) with a Java based cheminformatics company, ChemAxon. The stuff would be free for non commercial web pages, the login could be an issue but I think we could bend if you wanted to put it in - the point is about free access.

About the actual technology you would need,

On the visuals side: MarvinSketch would be for small molecule editing, MarvinView for viewing multiple structures - like an SD file - link here, a nice examplpe of using the editor as a query generator is here at the RCSB PDB site and MarvinSpace would be the kicker for pocket analysis/modelling.

Since this stuff is all java you can have it pop on an html page and I have been pushing inhouse for the ability to 'publish' view and other info to recreate exactly the view you get when you create a picture - this would really allow people to get into any image (which is linked in the background to the file calls the relevant viewer/editor), mess with it and re-publish. By using structure linked image files instead of live Java windows you save on long downloads and larger technical overhead. I like the idea that everyone is working in the file and commenting with the file - nice short loop. I am told the functionality is coming in the next few months but if I had a champion ;)...

On the database side we have JChem Base, Cartridge if you need to program from SQL - for the management and structure search funcitonalities and you would probably want to put Standardizer in there for formatting during registration and delivery - increases the quality of search performance - though it can come later.

 The only problem with our stuff (though some would argue it's a benefit ;) is that they are toolkits, everything is API's and a few editor /viewer GUI's (which also have API). We do have some bundled .jsp in the download just to jump start impleemntation but otherwise it's up to you. The good news is that its pretty straightforward - core functionality can be up in a few days and it seems that you have some skillz here :)

 Feel free to download the JChem package (this includes all the Marvins, infact it includes eveything) and work our support forum. I can confirm we will support free use for non commercial.

Alex 

I've just joined Synaptic Leap.  Your comments suggest a wider examination of cheminformatics in this enterprise.  I'm a medicinal chemist who has had to deal with this in a small company and I'd like to participate in the discussions.

MatTodd's picture

There's a lot we can talk about in terms of cheminformatics, and what kinds of tools might be useful. Manipulation of molecules in binding pockets is potentially the most complex/computationally demanding. At the other extreme, what I often find is the biggest barrier to having an online discussion of chemistry is how difficult it is to share chemical structures - still, in this day and age.

The killer app is a program that allows me to a) draw a molecule, b) copy a structure from a pdf or picture file and structurally interact with it, AND c) paste the structure onto a web page so that it is generically viewable without extra software downloads. For example, I cannot draw a molecule on this web page, paste it in the space below, and have anyone modify and re-paste easily. Does such a generic standard exist?

MDL Chime does exactly that (you'll need to register on the site, but it's free): 

http://www.mdl.com/downloads/public/chime/2.6.SP6/winnt4-2000-xp/MDLChime26SP6.jsp

MDL ISIS/Draw lets you edit molecules offline (also free).

As to grabbing molecules from pdfs and pictures - no such software exists, and, I believe, will never do.

gtaylor's picture

Are Chime and ISIS what you're looking for Matt?

Are they commonly used?

I found this list of chemical search databases . I notice several file formats / editors are named. Plus with those DB's and you're request, I'm starting to smell a feature enhancement for the site.

As you know, I'm no chemist (disclaimer this could be a really bad idea :-). But check out Useful Chem's molecule blog.  

  • Suppose we made it relatively easy for people to create one of those. I get the sense that Useful Chem has students doing those behind the scenes (correct me if I'm wrong).
  • And then we ran a patent search on the molecule and provided that information for free on the page. This would be read-only.
  • And then provided a wiki like page for people to add reference links if they know of research papers on the molecule. Perhaps even pulling in information from pubmed?
  • And then allowed for contextual discussions / threads.

Is this useful? Tweaks to the idea?

I think this is along the lines of what Alex was describing originally as well as what you've been talking about with synergy with Gene Wiki. Am I right? 

 

gtaylor's picture

A sincere thanks for offering your application. At this juncture our programming volunteers are overextended on other work in our pipe line. BUT, I would still like to put this on our list for consideration and in fact we may put this work as part of our work in a grant application.

What I would like to see though is some input from other chemists on the site. Remember, I'm a software management and collaboration type geek rather than a scientist. It would be great if chemists would comment on how useful these tools are and whether they make sense to add as tools on our site. Put your comments here or email me.

My general point is that we need to prioritize the enhancements (work). 

Thanks! 

If you are interested in open source chemoinformatics, please have a look at the Blue Obelisk website at http://www.blueobelisk.org/. Several open source chemoinformatics projects share thoughts and projects on open data, open source and open standards (nicked ODOSOS). It's a growing community and already covers quite a lot of what commercial chemoinformatics providers have to offer. If you want to know when things are going, visit their blogosphere at http://www.blueobelisk.org/planetbo/.

MatTodd's picture

Interesting site, Egon. I've had better experiences with Jmol than with Chime, but Jchempaint was new to me. That looks very promising. So - the question remains - what needs to happen so that I can draw a molecule in this space:

 

 

 

and have you modify it and post an analogue here:

 

 

 

with the minimum of effort and so that everyone else can see what we're talking about? Presumably Ginger needs the answer, in order to adapt the page we're on?

Mat

 

gtaylor's picture

Another plug for Jmol 

A fellow named Peter Murray-Rust sent me an email also plugging Jmol:

The Blue Obelisk is a group of Open Source Chemists doing exactly what you want. See http://www.blueobelisk.org which we believe will be come a major force in chemoinformatics. It's all Open, not just free (e.g. use Jmol and JSpecView rather than Chime

Feel free to copy this to your list.

In haste - more later.

P.

Blog at http://wwmm.ch.cam.ac.uk/blogs/murrayrust I deal with Open Data, Source etc.

The Synaptic Leap's Site  

Background for others noodeling on this - TSL is built with Drupal, an open source collaborative content solution built with PHP and MySql. I picked it because of the large deployment 1,000's and because of the extensible architecture. If you look at their modules page, you'll see that the community has gone nuts building extension modules.

My point is that we should be able to build an extension module for a chem specific collaboration module for our site. AND, it could be built openly for deployment uses beyond The Synaptic Leap. I have no lock in strategy here.

Is there anybody interested in building this for us? We're short on chem experts at the moment.

Do you want to collaborate on a grant together so that you get funded for your work?

Email me or make a post if you are.

 

In addition to specific tools like those from MDL or ChemAxon Synaptic Leap collaborations will need large databases to manage chemical and biological information. Here I describe two services with which I am familiar that could serve as the repository for data and avoid the need to reinvent the wheel.

PubChem is a component of the NIH Molecular Libraries Roadmap Initiative that provides public access to chemical structures and biological data (pubchem.ncbi.nlm.nih.gov). As of today there are over 8 million chemical structures and data for 281 biological assays deposited, all searchable by structure and values. I don't search the biodata much but it's my go-to site to look up common chemicals and drugs. This site is so useful that Chemical Abstracts feels threatened.

eMolecules (www.emolecules.com) is an open access chemistry search engine that aims to be Google for chemicals. Their mission is "to discover, curate and index all of the public chemical information in the world, and make it available to the public for free". (www.emolecules.com/doc/emolecules_mission.htm) Currently the entries are primarily commercially available compounds but they are growing. I know the founders who have previously created successful commercial software.

gtaylor's picture

This is very useful. Pubchem I had heard of; eMolecules is new to me and it has a JME chemical editor.

I waffle on whether TSL should host biomedical data or not. Part of me thinks that it's best if the research organizations host that. We're not a research organization. We're a web site providing a service for researchers. TSL's value-add is to bring that information together e.g. a mashup page and to provide tools for contextual/structural collaboration amongst the scientists. Imagine a ChemBoard mashup page with threaded discussions below.

On the other hand, I get the sense that people want to be able to manually augment the information that comes from souces such as PubChem or PubMed via a wiki-like interface. As soon as we go there, we start to "compete" with the originating data sources. Maybe by competing those data sources will be encouraged to add wiki-like editing features. Or maybe we'll divide and dilute the scientific efforts - not at all what I want.

If you have thoughts on this, please post them. I need more input.

 

gtaylor's picture

Since I didn't see an overwhelming request for a more comprehensive chemical wiki, we will likely look deeper into a fairly simple one. I'm going to write up my thoughts on it and platform requirements and will post when I can.

Cheers! 

jcbradley's picture

Ginger - in terms of having an additional place to collaborate on small molecule properties, that would definitely be useful for us. We currently have a collection of molecules that are of interest to our group and it would be great if others could contribute to the information that we have on them, generated either manually or automatically. Many of these molecules are expected to be inhibitors of enoyl reductase (PfENR) and the results of docking calculations would help us prioritize our syntheses.  

I really like the ideas that Alex has been putting up and I have contacted him to pursue some of these.  Unfortunately the software he mentions does not yet do docking. 

If you need a quick assessment of the docking mode for a molecule, the best tool IMHO is ArgusLab (http://www.arguslab.com). It's free, runs under Win and has very good user interface. The tutorial is also supplied, so that average person can learn the software in one evening. It can dock molecular libraries without user intervention. I also found its performance comparable to more advanced packages (Dock, AutoDock). Perhaps, it will not show all 'best hits', but it, definitely, discards the wrong ones.

gtaylor's picture

I'm a little anxious about anything that's windows only. Recent stats show that mac has more than 10% of the market and growing. I actually think the numbers are a bit higher with scientists. For example (nearly?) every laptop I saw at a recent visit to UCSF was a mac.

In summary look at the tool for ideas. But I think client/desktop independence is an important requirement. 

jcbradley's picture

Anatoly - thanks, we'll give it a try in UsefulChem

Hi, I have been going through this forum and seeing your comments on a tool to publish the chemical structures and edit it collaboratively. I think this would be a really good tool for scientists collaborating on a project. I would like to take stab at building such a tool with your help.

 I looked at the applications you mentioned - Only JME and Marvin having editing capabilities while JMol is only a viewer. JME can't handle image exports.

In terms of requirements, a) I am assuming we need to be able to view/edit both proteins (macromolecules) and simple chemical structures b) We need a tool that can export structures as images (jpegs, png) and c) Keep track of user comments for a structure 

Alex, does Marvin have an API to export structure views to an image? Marvin does a good job for simple chemical structures but had some difficulty handling macrostructures like PDB files. Jmol here is a better viewer. (try 1TTT.pdb)

If anyone has any other recommendations for a  protein/molecule viewer that has an open API, kindly post them online.

cheers

Sebastian 

 

 

 

HI Sebastian,

Good Man!

OK - yes Marvin has API for exporting images.

About macromolecules - did you try MarvinSpace? - I tried to open the "1TTT.pdb" and also got a format error, but I searched and got no result at RCSB PDB on "1TTT", is this the right name? - Anyway if it not performing well let me know and they will improve things.

Commenting in the file is not yet available (little user pressure for this feature), we have somewhat limited annotation on atoms but I think for the moment linking will be useful enough

If you have more questions (I am not really the right guy to speak to) use the forum (it's faster than this response ;), do email me aa_aatt_chemaxon.hu if I can forward any license keys (you will need for production search and canonicalization) or blue sky offline.

Good work.

Alex 

HI,

Just a few comments:

I think the point as to whether the data should be stored here (mirroring NIH, etc etc stores) isn't all that important - it is the linkage that is important. So long as we can connect (human) commentary with 'an entity' then we need only wait for stores to allow connection (something NIH and others work on actively (cheer!)). There are many linkage formas (name, InChI, SMILES, graph etc) some more unique than others, but I personally believe seeing is important (I am amazed at how visual chemists are).

To comeback to Mat Todds's comments, What you would need to have this structure/commentary ping-pong is the ability to draw or paste structures into an editor in a web page and 'publish' them. Ideally this would generate a structure file which would remember all the formatting you gave it when your drew it (including representations, surfaces, orientation, zoom, etc etc...everything) and would be added to your comment when you post. In one variant an image file is generated on the fly and can be default loaded into the wiki when it is next served but when the visitor clicks on the image you load the editor/viewer (maybe as a popup or a 'live window' posting page maybe) and load the structure 'as it was drawn' - then opponent edits, adds his comment and posts. The generation of the image file on the fly reduces the overhead on page load for surfers and the 'click on image' excuses the delay while the editor/viewer is loading, but if you accept Java in site overhead then go for variant 2 and just load live Java windows straight - THAT is sticky imagery to catch sci surfers. The "remembering the view" function is the key for a smooth operation (it becomes particularly essential when arguing over 3D visualization works like in Chime or  MarvinSpace, etc). About delivering this type of remembering or publishing functionality there is stuff out there I think that will do this (not sure about all platforms/web or interoperability), but I know all the Marvins (the ChemAxon Java editor/viewer/3D visualization toolkits) will have this by the end of the year. I will report back when development is complete.

About the machinery of getting something working - I cannot talk about other systems but I do know the ChemAxon tools are fairly simple to implement (high/low level API is available) - we have a user who has implemented in a forum and another who had the editor/viewer and search back end up in 2 days. I intend to get something in our forum just as soon as I can get a develper in a corner for a minute - tho we dont forsee this being a large task (he said). I come back when I have more experience. Comparing things, I would consider that if/whatever happens try to avoid technologies that must be installed in the users machine, this can be problematic since you may need Admin priveledges which users often do not have, particularly in commercial institutions with their security conscious admins.

IN response to Anatoly - Recognising drawn structures - I agree that an automated system for pulling the graph out of an image is unlikely but a system to get the majority of it out in preparation for final curation may not be impossible. I know of SymBioSys "CLiDE" and also Fraunhofer Institu have something (any links?) under development but I also do not see a near end for chemists in capturing printed structures.

A final point about the value of "an entity". Entities captured here (with their human comments) can be valuable outside this wiki if this wiki is set up in a way to present the content in a standard form that can be easily found and explored by queries originating from outside this site - in this way you greatly multiply the value of the commentary and knowledge you capture here (immortal references? ;)

Cheers alex

MatTodd's picture

Alex - the drawing functionality you describe is spot on. Particularly the non-requirement of locally installed software. I am a fussy user of chemical drawing packages, rather than knowing anything about how they work, so will leave further comments to others.

Ginger - I would agree we don't want to mirror data, but being able to use the site as a scratch pad for ideas is important, and that usually means drawing things.

But surely picking chemical structures out of pictures is trivial. If we are soon to have image searches that are active (e.g. 'Find me a picture on the web that *contains* an apple') then making a chemical picture an active structure is a piece of cake. Let's archive this thread so that people in ten years can laugh at us derisively!

Mat

 

I am afraid it isn't trivial! I agree that when the world has sorted out all other images than maybe chemistry will start to take notice. But at present it's hard.

First the quality of the images can be awful - see my blog for examples. But even then there are serious difficulties with what the glyphs mean. For example

-

could be taken to mean a minus sign or an ethane molecule. The annotation isn't standard and numbers can be indexes, atom counts, labels, charges or pages numbers.

I have spent about 1 month hacking at this. I think parts of it are tractable, but social computing may be more useful. If we can induce people to generate InChIs the problem is largely solved. 

Peter Murray-Rust http://wwmm.ch.cam.ac.uk/blogs/murrayrust

MatTodd's picture

How time flies! I wonder if anyone has any further ideas on this. Previously on this thread, Alex had mentioned CLiDE - I wondered whether anyone had used this tool/had had any luck with it?

Peter - I think InChi is the way to go (nice talk at Google, technical difficulties with your own server notwithstanding). InChi seems to be the way to make data searchable, but this does not help us rapidly talk about chemistry in the way we might with a piece of paper and a pencil to hand. Take this example. I need to post this diagram to explain something (in another post). It's a dead diagram - a useless gif. Nobody replying to me can interact with it - they have to redraw the diagram in order to tell me something. What's the simplest way a reader on this site can copy my figure, paste it in a reply, and modify it, such that this is useable by a chemist, not a chemoinformatics specialist?

Mat

 

MatTodd's picture

The problem of how to sketch molecules collaboratively online is still an issue for us. As an organic chemist, I have little idea of the technical requirements/barriers to implementing such a tool on this site (which runs off Drupal). If I want to talk to a colleague about a chemistry problem I grab a pen and paper or go to a whiteboard. I still can't do that here, which is a significant barrier to decent long-distance chemical collaboration.

I'm assuming that we need someone to code a Drupal module that integrates chemical drawing functionality? Alex and Sebastian above mentioned that this might be possible with existing stand-alone packages, but we really need someone to help implement this on this site. How?

Maybe a solution is Google's Summer of Code that Ginger previously mentioned. The 2008 round closed a couple of months back, but we could try to get a project up and running for application next year. We need a student to code a project entitled A Collaborative Chemical Drawing Tool for Open Source Websites. The closest thing on the Google Code blog I could find was this, managed by a guy called Greg Landrum. A Google search took me to the Blue Obelisk page at Sourceforge, where similar activity is going on. How can we use this activity/expertise to allow readers of this site to copy/click on existing drawings, edit them and re-post without installing any local software and without any knowledge other than organic chemistry? Given the outcomes of the 2008 GSoC, maybe we should contact the University of Moratuwa.

HI,

I am not sure I agree with making another chemical editor/viewer, there are many out there and the difficult bit is keeping it developed.

For 'a' development "suck it and see" is a good thing but that means implementation (resources) and that also means choices, directions environments so a lot of potential questions out there.

I would suggest start small see what the users are wanting/using and develop based on demand.

What first? I am not the best person to ask but I think making an editor/viewer available is immediately good for users but causes selection questions that can take inertia, maybe just a text box field in the 'Reply form' for a chemical string and generate (or not) an image in the background. Later, once you have some content the chemical search side build becomes more relevant. However immediately offering automatic search of public data for that string could also be valuable.

Without diverting this - if chemical structures working to improve recovery of posts is the aim here this could be approached from the text side where you store/search/generate structures from chemical names as written in posts. Anyway.

In the meantime - I have something to show, if you visit chemaxon.com and mouse up to the gallery images, clicking the icons to the right of the images cause Marvin or other apps to be opened (Java WebStart) and the view (well not quite) to be re-created with a live structure/project. Saving the state of a project is still unfinished with Marvin. Also the implementation is not quite completed on the site.

To move this forward in the only direction I can... For those considering implementation have a look at these Marvin code examples (for visualization/editing/image generation/file format conversion). For storage and searching the JChem code examples have JSP > ASP.NET relevant code. The FAQ will help greatly in getting the software/hardware environments - but you know that;).

If resources appear I would be happy to forward the training materials we have related to the implementation your going for. I can also arrange some introductory discussion from our developers, let me know.

A few comments from a "cheminformatics guy".

The first is quite simple: if you are not philosophically bound to using open-source software, then I don't think you can do better than to use the ChemAxon software, as the original poster offered. They make nice software that would be free for you to use. Importantly you can get standard cheminformatics file formats out of/into their
tools, so your data wouldn't be locked in.


If you want to stay in the open-source realm then I would suggest coming up with a list of specific things you'd like to be able to do with the software (unit operations, so to speak) and the community can suggest solutions for those individual pieces. This will be made easier if you are quite detailed about what you want to do. For example, above I saw discussion of including drawings of chemical structures, but then the example that was provided is a reaction. From a cheminformatics point of view, molecules and reactions can be/are different creatures.

While thinking about requirements, keep in mind that an open-source solution that displays static molecules as images (which can be clicked to download a standard format) and allows the user to upload their own molecules (in standard formats) is going to be much easier to find the pieces for than a system where the molecules are live and editable in the page.

Finally a clarification about "my" software project, the RDKit: that system does not have an interactive molecule editor; it can create static pictures from molecules but it cannot be used to graphically edit molecules.

I copy the text from this post on our forum where one of our users has rapidly implemented Marvin into MediaWiki. Although not covering the search engine/backend, the short GUI development time and media relevance should be encouraging.

Alex

Hi,

I have written in half a day a some lines of code to integrate marvin view and marvin sketch in wikimedia.

The first extension is just some lines of code to translate <marvin> tags in html code (that displays compounds).

Source code is available at :

http://cheminfo.u-strasbg.fr/mediawiki/index.php/ChemAxon-Marvin_Mediawiki_Extensions

The second extension called marvinEditButton is to draw compounds after
clicking on the toolbar of the wiki Edit page (Button with character M
in uppercase for Marvin).

For now, it is at the following URL (in miscellaneous tool section - marvinEditButton ) :

http://cheminfo.u-strasbg.fr/template/jd/pages/download/download.php

How I said this code is really basic. It's a kind of skeleton. Feel free to improve it and put more options.

All instructions to install the extension are on :

http://cheminfo.u-strasbg.fr/mediawiki/index.php/ChemAxon-Marvin_Mediawiki_Extensions

Kind regards

MatTodd's picture

The recent comments on possible ways of drawing molecules on this site are all extremely valuable. However, I'm an organic chemist, and have no idea how to implement them. Perhaps this conversation needs to happen between those suggesting solutions (above) and those with the knowledge to alter TSL, which is based on Drupal. i.e., how do we use Chemaxon/Marvin *on this site.* I can't help with that discussion at all.

The second question, raised by Greg, was to clarify what we need. I think initially we need an app that permits: The drawing of molecules while composing an entry/comment (let's leave reactions for now), and which allows the drawer to click a button to convert the sketch into a picture that is pasted in the entry. It would be useful for searching purposes if this operation also pasted an Inchi string underneath the picture.

Currently I need to draw a molecule with a separate program, save it as a picture file, and insert it. I then need to manually paste in the Inchi. This is a pain.

If this is do-able, that would be a considerable advance. The next phase would be to allow the copying of the picture for re-editing and re-pasting. For example, currently many chemists use Chemdraw to draw molecules. The drawings may be copied from Chemdraw and pasted into Word. These can be clicked on from within the Word document and re-edited/copied in situ. This rarely works, but it's the right idea!

Mat

jcbradley's picture

Matt,
A solution that we're using for UsefulChem to handle molecules is to use ChemSpider to do the heavy lifting of storing molecules, their spectra and enabling substructure searching. We also use InChIs and InChIKeys generated by ChemSpider to tag our wiki pages for indexing by Google.
Yes, it does require a manual cut and paste of those tags so it doesn't meet your request exactly. But I think it is a pretty good solution until you find someone to do a lot of customization on your server.