Sunday, April 01, 2007

Bio::Blogs #9 - small update

Welcome to the ninth edition of the bioinformatics blog journal Bio::Blogs posted online on the 1st of April of 2007.

Today is an exciting day for bioinformatics and open science in general. I am happy to report on an ongoing project in Nature that has been under wraps for quite a long time. It is called Nature Sherlock and it promises to turn the dream of rich semantic web for scientist a reality. This service is still in closed beta but you can have a look at ( to see that the service does exist and you might from the name get a sense for what it might do. I have been allowed to use Sherlock for some time and according to the FAQ of the main website it has been co-developed by Google and Nature and it is one of the results of meetings that went on during the 1st Science Foo Camp (also co-organized by Google and Nature). Access to the main site requires a beta tester password but I can say that Sherlock looks like a very promising tool. Sherlock is the code-name for the main bot that is set to crawl text and databases from willing providers (current partners include Nature, EBI, NCBI and Pubmed Central) to produce semantic web objects that abide to well established standards in biology. Some of the results, specially regarding the text mining, are of lower accuracy (details can be found on the help pages) but overall it looks like an amazing tool. I hope that they get this out soon.

In this month's Bio::Blogs I have included many posts that were not submitted but I thought were interesting and worth mentioning. This might be a more biased selection but in this way I can make up for the current low number of submission. As in the last edition, the blog posts mentioned were converted into PDF for anyone interested in downloading and reading Bio::Blogs offline (anyway you might enjoy this). There are many interesting comments in online blog posts that I did not include in the PDF, so if you read this offline and find something interesting go online for the discussion.

News and Views
This month saw the announcement of the main findings coming from the Global Ocean Sampling Expedition. Several articles published in PLoS Biology detail the main conclusions of Craig Venter's efforts to sequence the microbial diversity. Both Konrad and Roland Krause blogged some comments on this metagenomics initiative.

I will start up this section highlighting Stew's post on software availability. Testing around 111 resources taken from the Application Notes published in the March issues of Bioinformatics shows that between 11% to 17% (depending on the year) of these resources are no longer available. Even considering that bioinformatic research runs at very fast pace and that some of these resources might be outdated by now there is no reason why these resources should not be available (as was required for publication).
RPG from Evolgen submitted a post entitled “I Got Your Distribution Right Here” were he analyzes the variation of genome sizes among birds. He concludes by noting that the variability of genome sizes in aves , is smaller than in squamata (lizards and snakes), and testudines (turtles, tortoises, and terrapins). An interesting question might then be why do birds have a smaller distribution of genome sizes. Is there a selection pressure ?
Barry Mahfood submitted a blog post where he ask the question: “Is Death Really Necessary?”. Looking at the human life-expectancy in different periods in time and thinking about what might determine self, Barry thinks that eternal life is achievable in the very near future.

Semantic web/Mach-up/web-services series
This month there were several blog posts regarding mash-ups, web-services and semantic web. All of these relate to the ease of accessing data online and combining data and services together to produce useful and interesting out-comes.
Freebase has a large potential to bring some of the semantic web concept closer to reality. Deepak sent in a link to his description of Freebase and the potential usefulness of the site for scientists. I had the fortune of receiving an invitation to test the service but I did not have time yet to fully explore it.
I hope you saw trough my April fools introduction to Nature Sherlock. Even if Nature Sherlock does not really exist (it is a service to look for similar articles), it is clear that the Nature publishing group is the most active science publisher on the web. Tony Hammond in Nascent gave in a recent blog post a brief description of some of the tools Nature is working on.
While we are waiting for web services and data to become easier to work with we can speed up the process by using web scraping technologies like openKapow (described by me) or dapper (explained by Andrew Perry). These tools can help you create an interface to services that do not provide APIs.

Tips and Tricks

I will end up this months edition with a collection of tips for bioinformatics. Bosco wrote an interesting post - “Notes to a young computational biologist”- were he collects a series of useful tips for anyone working in bioinformatics. There is a long thread of of comments with other people's ideas making it a useful resource. On a similar note Keith Robison, wrote about error messages and the typical traps that might take a long time to debug if we are not familiar with them. (Update) In reply to a recent editorial in PLoS Computational Biology, Chris sent in some tips for collaboration work.
From Neil Saunder's blog comes a tutorial on setting up a reference manager system using LaTeX. I work mostly on a windows machine and I am happy with Word plus Endnote but I will keep this in mind if I try to change to a Linux set up.
Finally I end up this month's edition with a submission from Suresh Kumar on “Designing primer through computational approach”. It is a nice summary of things to keep in mind for primer design along with useful links to tools and websites that might come in handy.

Update - Just to be sure, the Nature Sherlock is as real as the new Google TiSP wifi service.