Wednesday, October 26, 2005

Google Base and Bioinformatics

Google is creating a new service called Google Base. It looks like a general database service. Currently I cannot yet login but from the discussions around in the blogs we will be able to define content types and populate the database with our own content. I don't know how much space will be allocated to each user but I guess that this will be at least the disk space of our gmail accounts (around 2.5G currently and growing).
Can the bioinformatics community take advantage of this ?
Well one of the most boring tasks that we usually have to perform is cross-referencing databases. This usually means downloading some flat-files and spending some time scripting up some stuff. Of course some of the main databases take up way more then the 2.5G but we could imagine that having all databases under the same hosting service would help us. Probably Google Base will have a nice standard API that would come in handy for accessing all sorts of different data.
The next step would be the ability to do some processing on the data right on their servers. Please Google set up some clusters with some standard software and queuing systems. We have clusters here at EMBL but Google would do a lot of researchers a favor by "selling" computer processing time for some ads :).