Tuesday, April 08, 2008

Bio::Blogs#20 - the very late edition

I said I would organize the 20th edition of Bio::Blogs here on the 1st of April but April fools and my current work load did not allow me to get Bio::Blogs up on time.

There were a couple of interesting discussions and blog posts in March worth noting. For example, Neil mentioned a post by Jennifer Rohn started that initiated what could be one of the longest threads in Nature Network :"In which I utterly fail to conceptualize". It started off as small anti-Excel rant but turned in the comments to 1st) a discussion of bioinformatic tools to use, 2nd) a discussion of wet versus dry mindset and how much one should devote to learn the other. Finally it ended up as a exchange about collaborations and how a social networking site like Nature Network could/should help scientists find collaborators. There was even a group started by Bob O'Hara to discuss this last issue further.

I commented on the thread already but can try to expand a bit on it here. Nature Network is positioned as a social networking site for scientists. So far the best that it has to offer has been the blog posts and forum discussions. This is not very different from a "typical" forum. It facilitates the exchange of ideas around scientific topics but NN could try to look at all the typical needs of scientists (lab books, grant managing, lab managing, collaborations, protocols, paper recomendations,etc) and decide on a couple that they could work into the social network site. Ways to search for collaborators and maybe paper recommendation engines that take advantage of your network (network+connotea) are the most obvious and easier to implement. Thinking long term, tools to help manage the lab could be an interesting addition.

Another interesting discussion started from a post by Cameron Neylon on a data model for electronic lab notebooks (part I, II, III). Read also Neil's post, and Gibson's reply to Cameron on FuGE.
How much of the day to day activities and results need to be structured ? How heavy should this structure be to capture enough useful computer readable information ? Although I find these questions and discussion interesting, I would guess that we are far from having this applied to any great extent. If most people are reluctant to try out new applications they will be even less willing to convey their day to day practices via a structured data model. I mentioned recently the experiment under way at FEBS letters journal to create structured abstracts during the publishing process. As part of the announcement the editors commissioned reviews on the topic. It is worth reading the review by Florian Leitner and Alfonso Valencia on computational annotation methods. They argue for the creation of semi-automated tools that take advantage of the automatic methods and the curators (authors or others). The problems and solutions for annotation of scientific papers are shared with digital lab notebooks. It hope that more interest in this problem will lead to easy to use tools that suggest annotations for users under some controlled vocabularies.

Several people blogged about the 15 year old bug found in the BLOSUM matrices and the uncertainty in multiple sequence alignments. See posts by Neil, Kay Lars and Mailund.
Both cases remind us of the importance of using tools critically. The flip side of this is that it is impossible to constantly question every single tool we use since this would slow our work down to a crawl.

In the topic of Open Science, in March the Open Science proposal drafted by Shirley Wu and Cameron Neylon, for the Pacific Symposium on Biocomputing was accepted. It was accepted as a 3 hour workshop consisting of invited talks, demos and discussions. The call for participation is here along with the important deadlines for submissions (talk proposals due June 1st and poster abstracts due the 12th of September).

On a related note Michael Barton has set up a research stream (explained here) He is collecting updates on his work, tagged papers and graphs posted to Flickr into one feed that gives an immediate impression of what he is working on at present time. This is really a great set up. Even for private use withing a lab or across labs for collaboration this would give everyone involved the capacity to tap into the interesting feeds. I would probably not like to have everyone's feeds and maybe a supervisor should have access to some filtered set of feeds or tags to get only the important updates but this looks a step in the right direction. The same way, machines could also have research feeds that I could subscribe too to get updates on some data source.

Also in March, Deepak suggested we need more LEAP (Lightly Engineered Application Products)in science. He suggests that it is better to have one tool that does a job very well than one that does many somewhat well. I guess we have a few examples of this in science. Some of the most cited papers of all time are very well known cases of a tool that does one job well (ex: BLAST).


Finally, some meta-news on Bio::Blogs. I am currently way behind on many work commitments and I don't think I can keep up the (light) editorial work required for Bio::Blogs so I am considering stopping Bio::Blogs altogether. It has been almost two years and it has been fun and hopefully useful. The initial goal of trying to nit together the bioinformatic related blogs and offering some form of highlighting service is still required but I am not sure this is the best way going forward.
Still, if anyone wants to take over from here let me know by email (bioblogs at gmail.com).