Cellular Consequences of Genetic variation

Saturday, December 10, 2005

Linking out (blogs@nature && workflows)

Rolf Apweiler called bloggers exhibitionist in a recent news special in Nature -"I have my doubts that blogging reduces information overload, but blogging will survive as it appeals to all the exhibitionists,". I hope this simplistic opinion is supported by more reasoning that was not included in the news piece because of lack of space. Blogging appeals to the easy creation of content, it makes it easier for people to have a voice. What gets people attention is how good (or bad) the content is, not the particular connections or any other bias. This makes blogs one the most democratic content medium I am aware of (compare it to newspapers, radio, tv). Discussion in Notes from the Biomass

Check out some interesting post on workflows in Hublog and Flags and Lollipops

Back to roots

I like bioinformatics because it is so useful at pointing out the next useful experiments and helping to extract knowledge from your data. This is why I think it is possible and useful to do experimental work alongside with computational work.
I have spent the last week back in the bench doing some biochemistry. I usually don't do much bench work although I have a biochemistry degree. It is at the moment not easy to keep up doing my computational work while doing the lab work but I want, until the end of my PhD, to find a way to keep doing both things at the same time. I should divide my time between the two mind sets but I am not sure of the best way.
Any ideas ?

Wednesday, November 30, 2005

Firefox 1.5

A quick post to promote the release of a new version of firefox. If you already have it, go get it here. If you don't have it yet, give it a try, it takes one or two minutes to install and has nice advantages compared to some other popular browsers (just an example of the top of my head ... it is better than the internet explorer :) ).
There are going to be some potentially funny films to see in Spreadfirefox.com.

I am still playing around with but the first surprise is immediate, much quicker to move between tabs. You can now move the tabs around with drag and drop to re-order them. New features listed here.

Monday, November 28, 2005

Meta Blogging

If by any strange reason you are searching for some blogs to read allow me to make a suggestion. Via NyTimes I found this site called TravelBlog for people blogging while traveling. From the site :"Travel Blog is a collection of travel journals, diaries, stories and photos from all around the world, ordinary people doing extraordinary things. For travellers, this site includes lots of features that help keep family and friends back home up to date with your adventure."

I would not put any of these in my usual reads but maybe I will check back to this page before my next long holidays ... umm .. sometime after I finish my phd.

Sunday, November 27, 2005

SyntheticBiology@Nature.com

This week Nature has a special issue on <buzz>Synthetic Biology</buzz>. I have currently a kind of love/hate relationship with trends in biology. It is easy to track the trends (in the recent past: genomics, proteomics, bioinformatics, systems biology, nanotechnology, synthetic biology) and it is somehow fascinating to follow them and watch them propagate. It holds for me a similar fascination has seeing a meme propagate in the web. Someone will still write a thesis on how a kid was able to put up a webpage like this one and make truck load of money selling pixels just because he ignited the curiosity of people on a global scale.
There is always a reason behind each rising trend in biology, but they are clearly too short lived to deliver on their expectations, so what is the point ? Why do these waves of buzz exist in research ? The mentality of engineering in biology is not new so why the recent interest in synthetic biology ?
I am too young to know if this has always been like this but I am inclined to think that this is just to product of increasing competition for resources (grant applications). Every once in a while scientist have to re-invent the pressing reasons why society has to invest in them. The big projects that will galvanize the masses, the next genome project.

I personally like the engineering approach to biology. Much of the work that is done in the lab where I am doing my phd is engineering oriented. Synthetic biology (or whatever it was called in the past and will be called in the future) could deliver things like cheap energy (biological solar panels),cheaper chemicals (optimized systems of production), cheap food (GMOs or some even weirder tissue cultures), clean water, improved immune-systems, etc. A quick look at the two reviews that are in this week's issue of Nature will tell you that we are still far from all of this.

The review by David Sprinzak and Michael Elowitz tries to cover broadly what as been achieved in engineering biological systems in the last couple of years (references range from 2000 to 2005). Apart from the reference to a paper on the engineering a mevalonate pathway in Escherichia coli, most of the work so far done in the field is preliminary. People have been trying to assemble simple systems and end up learning new things along the way.

The second review is authored by Drew Endy and is basically synthetic biology evangelism :). Drew Endy has been one of the voices shouting louder in support of this field and in looking for standardization and open exchange of information and materials (some notes from the biomass). The only new thing he says in this review that I have not heard before from him is a short paragraph on evolution. We are used to engineering things that do not replicate (cars, computers, tv sets, etc) and the field will have to start thinking of the consequences of evolution of the systems it tinkers with. Are the systems sustainable ? Will they change within their useful life time ?

There is one accompanying research paper reporting on a chimeric light sensing protein that is de-phosphorylated in the presence of red light. The bacteria produce lacZ in the dark and the production is decreased with increasing amounts of red light. You can make funny pictures with these bacteria but has for the real scientific value of this discovery I can link to two comments in Slashdot. Maybe that is exaggerated. Making chimeric protein receptors that work can be tricky and it is very nice that something started by college students can end up in a Nature paper.

Last but not least there is a comic ! The fantastic: "Adventures in Synthetic Biology". Ok, here is where I draw the line :) Who is this for ? Since when do teens read Nature ? How would they have access to this ? I like comics, I do ... but this is clearly not properly targeted.

Monday, November 21, 2005

BIND database runs out of funding

I only noticed today that BIND as ran out of funding. They say so on the home page and there are links to several paper regarding the issue of sustainable database funding (has of 16 November 2005).

From the frontpage of BIND:
"Finally, I would like to reiterate my conviction that public databases are essential requirements for the future of life sciences research. The question arises will these be free or will they require a subscription. Should BIND/Blueprint be sustained as a public-funded open-access database and service provider? "

I am not sure actually what would be a good way out for BIND. They could try to charge institutional access like the Faculty1000 or ISI. The other possibility would be to try to secure support from a place like NCBI or EBI. The problem is that there are several other databases available that do the same thing (MINT,DIP,GRID, IntAct,etc) so why should we pay for this service ? Why don't the protein-interaction databases fuse for example? I now that they agreed to share the data in the same format, so maybe there is not enough space for so many different databases developing new tools. The question is probably more of the curation effort then. Who should pay for the curation effort ? The users of the databases? The major institutions ? The journals (they could at least force the authors to submit interaction data directly) ?

There is also a link to the blog of Christopher Hogue called BioImplement. He expresses his views of the problem.

Saturday, November 19, 2005

Google Base simple tricks

I was playing with Gbase, just browsing for content and I noticed that when you search for content that already has a lot of entries you can restrict the outcome of the search very much like you do in a structured database. For example when you look for jobs you notice that in top you have "Refine your search" and you can click for example "job type" and if you select for example "permanent" you get something like all jobs where job type is permanent. It is all above in the URL so it is very simple to mess around until you can guess what most of those things are doing up there.

From this:
http://base.google.com/base/search?q=jobs&a_r=1&nd=0&scoring=r&us=0&a_n194=job+type&a_y194=1&a_s194=0&a_o194=0&a_v194=permanent&a_v194=

You really just need:
http://base.google.com/base/search?a_n194=job+type&a_y194=1&a_o194=0&a_v194=permanent
to get the same effect. Basically this gets all entries with "job type" equal "permanent". The 194 is not even important as long as the number is equal in all of the variables.
So this also gives the same:
http://base.google.com/base/search?a_n1=job+type&a_y1=1&a_o1=0&a_v1=permanent
a_n[identifier]=NAME
a_v[identifier]=VALUE
a_y[identifier]= ? (I think it is a boolean of some sort)
a_o[identifier]= how to evaluate the value 0=equal 1=less than 2=greater than

You can add construction like this to get an AND construction but so far I did not find an equivalent to an OR construction. This is almost good enough to work with.

So all protein sequences from S.cerevisiae would be:
http://base.google.com/base/search?a_n1=sequence+type&a_y1=1&a_o1=0&a_v1=protein&a_n2=species&a_y2=1&a_o2=0&a_v2=s.cerevisiae

Thursday, November 17, 2005

Google Base and Bioinformatics II

The Google Base service is officially open in beta (as usual). Is is mostly disappointing because you can do nothing with it really (read previous post). You can load tons of data, very rapidly although they take a lot of time to process the bulk uploads. Maybe this will speed up in the future. The problem is once you have your structured data in Google Base you cannot do anything with it apart from searching and looking at it with the browser. I uploaded a couple of protein sequences just for fun. I called the item "biological sequence" and I gave it very simple attributes like sequence, id, and type. The upload failed because I did not have a title so I added title and just copied the id field. Not very exciting right.

I guess you can scrape the data off it automatically but that is not very nice. This for example gets the object ids for the biological sequences I uploaded:


use LWP::UserAgent;
use HTTP::Request;
my $url = "http://base.google.com/base/search?q=biological+sequence";
my $ua = new LWP::UserAgent();
my $req = HTTP::Request->new('GET',$url);
my $res = $ua->request($req);
open(DATA, ">google.base.temp") || die "outputfile didn't open $!";
print DATA $res->content;
close DATA;
open (IN,"<google.base.temp")|| die "Error in input $!";
grep(/oid=([0-9]+)\">(\S+)</ && ($data{$1}=$2) ,<IN>);
close IN;
foreach $id (keys %data) {print $id,"\n";}

With the object ids then you can do the same to get the sequences.

Anyway, everybody is half expecting that one day google will release an API to do this properly. So coming back to scientific research, is this useful for anything ? Even with a proper API this is just a database. It will make it easy for people to rapidly set up a database and maybe google can make a simple template webpage service do display the content of the structured database. It would be a nice add-on to blogger for example. You could get a tile to put in your blog with an easy way to display the content of your structured database.

For virtual online collaborative research (aka science 2.0 :)?) this is potentially useful because you get a free tool to set up a database for a given project. Apart from this I don't see potential applications but like the name says it is just the base for something.

Monday, November 14, 2005

The Human Puppet

One of the current trends in our changing internet is the phenomena of "collective intelligence" (web2.0 buzz) where the rise and ease of individual participation can result in amazing collective efforts. The usual example for collective intelligence is the success of Wikipedia but more examples are sure to follow.
This sets the grounds for a possibly strange scenario in a kind of sci fi "what if" game. What if a human being decided that he/she did not want to decide anymore ? (funny paradox :) - "I'll be a vessel for the collective intelligence of the web, I'll be the human puppet". Taken to the extreme this someone would walk around with a webcam and with easy tools to constantly interact with the web. The ultimate big brother but voluntary. The masses in the web would constantly discuss and decide the life of the puppet. This someone would benefit from the knowledge and experience of a huge group of people and could, in theory really stand on the shoulders of giants.

Of course this is an extreme scenario that might not come to pass, but sci fi is useful to think of the possible consequences of a trend. Lighter versions of this scenario probably occur already in the blogosphere when people talk online about their daily lives and receive council from anonymous people.

Would someone ever give up their individuality to be directed by a collective intelligence ? Would a group of people be attracted by the chance of directing someone's life ?

Thursday, November 10, 2005

In the latest issue of Current Biology there is a short two-page interview (sub-only) with Ronald Plasterk, current director of the Hubrecht Laboratory in Utrecht.
He had some very funny things to say about systems biology :
"The fundamental misconception of systems biology advocates is that one could create a virtual cell, and use big computers to model life and make discoveries. None of these modellers ever predicted that small microRNAs would play a role. One makes discoveries by watching, working, checking. They want to be Darwin, but do not want to waste years on the Beagle. They want sex but no love, icing but no cake. Scientific pornography."

I had a great laugh with this one :), however I happen to be working in a lab that is making software to exactly this and I disagree with this analogy. Of course you cannot discover something with your model about biological mechanisms that we know nothing about, but for sure that modeling approaches can help guide experimental work. If you model fails to explain an observation you can use the model to guide your next experiment. You go on by perfecting your model based on the results and so on. These cycles are not much different from what biologist have been doing intuitively but I think that few people would disagree that formalizing this process with the help of computational tools is a good idea.

Sunday, November 06, 2005

The internet strategies of scientific journals

After a post in Nodalpoint about Nature's podcast I was left thinking a bit about the different responses of the different well known science journals to the increase of internet usage and changes in the technologies available.
I took a quick look at the publishing houses behind nature (Nature Publishing Group), cell (Cell Press), science (AAAS), PLoS and the BMC journals. There are a lot more publishers but these are sufficient to make the point.
What is the first impact? Only a fraction of these have the portal attitude (mostly Nature and the BMC journals) with content in the first page and gateways of specialized content. The rest have almost no real content apart from linking to the respective journals.
What if we try to dig further ? Well they all have an RSS feed to the content. Funny enough almost all of them have a jobs listing (except PLoS). Almost all have a list of most accessed articles (except Science).
Only Science and Nature produce news content for the general public that are good to attract other people than researchers to their sites. The equivalent in BMC would be the content of The Scientist that they have on the site and in PLoS it would be the synopsis that come with all papers.
How many allow for comments ? Only the two most recent publishers (BMC and PLoS) but PLoS is a bit more formal about it, and Science allow for comments online.
Then it comes downs to some particular content and services. BMC has several possible interesting services like the Peoples Archive, images MD, Primers in Biology. Then there is Nature with Connotea, Nature podcast, Nature products and Nature events.

So what is the point ? In the tech world first it was all about portals and creating content to keep people coming back. Nowadays it seems to be more about free services and there are very few of these publishers following the trend. Good services build brand and attract viewers.
The simple conclusion is that only Nature and BMC are building their sites and playing with new services like a tech company would and although the impact at present time is minimal, when researchers start using more online services these sites will have a head start.

Thursday, November 03, 2005

Recent reads - two useful applications of bioinformatics

Is bioinformatics actually producing any useful tools or discovering anything new ? I would like to think so :). Here is a table from The Scientist showing the top ten cited papers of the last 2 years, last 10 years and of all time. Blast and Clustal are among the two ten cited papers in the last 10 years and MFold is within the top ten cited papers of the last two years.

Keeping in the spirit of practical applications of computational biology here two recent papers I read.

One is about the computational design of ribozymes. The authors computationally designed different ribozymes that could perform different logical functions. For example they were able to design a AND rybozyme that would self cleave only in the presence of two different particular oligos. They experimentally validated the results in vitro. These ribozymes can be combined to make more complicated circuits and could ultimately be used inside the cells to interfere with the networks in a rational matter or maybe to act as sensors,etc. They don't discuss how applicable this results are for in-vivo studies since ion content, pH and a lot of other things cannot be controlled in the same way.

Another interesting paper is about predicting the specificity of protein-DNA binding using structural models. They did this by developing a model for the free energy of protein-DNA interactions. With the model developed they could calculate the binding energy for structures of proteins bound to DNA and to any such complex after changing the bases in the DNA sites in contact with the protein. This results in a position specific scoring matrix that informs us of what are the preferred nucleotides at each positions for a particular DNA binding protein domain.
The protein-DNA interaction module is incorporated into the ROSETTA package. The authors provide all experimental datasets used in the supplementary material that other people might use to compare with other methods. The lab that I am working in has a similar software package called Fold-X.

Assuming that the structural coverage of biological parts will continue the current growing trend these structure based methods will become even more useful since one can in principle apply them by modeling the domain of interest by homology.

Tuesday, November 01, 2005

Our collective mind

As I sit here quietly blogging my thoughts away you are there listening. One click away and I share this with the world. Millions of clicks sharing their feelings, showing what they are seeing, calling out for attention, collectivly understanding the world. Amazing conversations are being automatically tracked around the whole world and we can participate. People are thinking that one day we will see emergent properties in the web. Something like it becoming alive. What do you mean .. one day ? One click more and another neuron fires, another pulse in the live wires connecting us all. We are just awaking up.

Wednesday, October 26, 2005

Google Base and Bioinformatics

Google is creating a new service called Google Base. It looks like a general database service. Currently I cannot yet login but from the discussions around in the blogs we will be able to define content types and populate the database with our own content. I don't know how much space will be allocated to each user but I guess that this will be at least the disk space of our gmail accounts (around 2.5G currently and growing).
Can the bioinformatics community take advantage of this ?
Well one of the most boring tasks that we usually have to perform is cross-referencing databases. This usually means downloading some flat-files and spending some time scripting up some stuff. Of course some of the main databases take up way more then the 2.5G but we could imagine that having all databases under the same hosting service would help us. Probably Google Base will have a nice standard API that would come in handy for accessing all sorts of different data.
The next step would be the ability to do some processing on the data right on their servers. Please Google set up some clusters with some standard software and queuing systems. We have clusters here at EMBL but Google would do a lot of researchers a favor by "selling" computer processing time for some ads :).

Protein Modules Consortium & Synthetic Biology

I have become a member of the Protein Modules Consortium, along with all participants in the FEBS course on modular protein domains that I attended recently. The aim of the consortium is the "promotion of scientific knowledge concerning the structure and function of protein modules, as well as the dissemination of scientific knowledge acquired by various means of communication".

Modular protein domains are "parts" inside a protein that can be regarded as a module. In this sense one could try to understand the function of a protein by understanding how the modular parts behave in the context of the whole protein. Another useful interpretation is that one should be able to create a database of modules that we can understand and create proteins with a predetermined function by copying and pasting the parts in the right way. Here are two short reviews on the subject. What would be the most efficient way of creating a database of protein parts that can be combined ? They should all be cloned into the vectors in the same way and there should be already tested protocols to rapidly combine the parts together. One of the future goals of the consortium, that was discussed in the FEBS course, is exactly to promote a set of cloning standards that could be used to this effect.

One possible strategy would be to use the Gateway cloning system. This is an in-vitro cloning method that is used for example by Marc Vidal's lab in the orfeome project of C. elegans. It is a reliable system , specially for small protein domains, and it is very fast. Compared to traditional cloning strategies it could be a bit more expensive but not much more if you consider the cost of the restriction/ligase enzymes. Creating an "entry" vector can be done with a PCR reaction followed by a recombination reaction (~2h) (followed by the usual transformation and sequencing steps) and this entry vector could be then stored in the databank. The biggest disadvantage mentioned for this cloning strategy is the reported low efficiency in cloning big proteins, but this would not be a problem for protein domains since the average protein domain size is around 100 amino-acids.

For reference, here is a paper where the authors compare different recombination systems, and another where the authors show a proof of principle experiment on how to use Gateway recombination to assemble functional proteins from cloned parts.

Monday, October 17, 2005

Your identity "aura"

I was thinking today of some possible future trends on our way to man-machine integration (known to some as the singularity :). More exactly I was thinking of all the recent moves on the portable devices, like the speed at which Apple is sending new iPods to the market and the Palm-Microsoft deal. The idea is simple and probably not very new. Wouldn't it be nice to carry your identity around in a machine readable format. It does not really matter in what way, it could be for example a device with wireless connection with a certain radius that you could turn on and off whenever you wished (any recent palm/cell-phone thing will have it nowadays). Now imagine you walk into a bar and the bar recognizes your identity, takes you list of music preferences from your music player or from the net and includes them into the statistical DJ playing stuff. This way the music the bar will play will be a balanced mesh of the tastes of the majority of people inside. The same way you could pass by any social place and check out the most used tags of the people inside to decide if this is the type of place for you. People broadcasting their identities would bring the same type of web 2.0 mesh/innovations to the social places around us in the real world.

Wednesday, October 12, 2005

In support of text mining

There is a commentary in Nature Biotech where the authors used text mining to look at how knowledge about molecular interactions grows over time. To do this, they used time-stamped statements about molecular interactions taken from full-text articles from 60 journals from 1999-2002. They describe how knowledge mostly expands from known "old" interactions instead of "jumping" to areas of the interaction space that is totally unconnected from previous knowledge. Since this work is based on statements about interactions I guess that the authors did not take into account the data coming from the high-throughput methods that is not described in the papers but is deposited in databases. In fact, in a recent effort to map the human protein-protein interaction network there was very little overlap between the know interactions and the new set of proposed interactions. What we might conclude from this is that although high-throughput methods are more error-prone than small-scale experiments they help us to jump to unexplored knowledge space.
The other two main conclusions of the commentary are that some facts are restricted to "knowledge pockets" and that only a small part of the network is growing at a given time. In general they try to make a case for the use of text mining but they do not go into the details of how this should be implemented. They do not talk about the possible roles of databases, tagging, journals, funding agencies, etc in this process of knowledge growth. Databases should help to solve the problem of knowledge pockets the authors mention. Tagging can eliminate the need for mining the data and journals/funding agencies have the power to force the authors to deposit the data in databases or tag their research along with the paper.
Without wanting to attract the wrath of people working on text mining, my opinion is that at least an equal amount of effort should be dedicated in making the knowledge that is discovered in the future easier to recollect.

Saturday, October 08, 2005

Biology Direct

I am just propagating the announcement of a new journal. You can also read it in Propeller Twist and in Notes from the Biomass. There are tons of new journals coming up, so what is so interesting about this one ? Well they claim that they will implement a novel system of peer review where the author must find three board members to peer review the article. The paper is rejected if the author cannot get the board members to referee the work. Another interesting idea is that the referee can write comments to be published along with the paper. They plan to cover broadly the field of biology but they say that they will start off with Genomics, Bioinformatics and Systems Biology. The editorial board is full of very well know people from these areas so I assume that this is actually a journal to keep a look out for in the future.

Connotea and tags

I have finally started using Connotea from Nature Publishing Group. I'm not a big user of these types of "social" web services like del.icio.us or Flickr but I thought I would give this a try since I do a lot of reading and I would like a nice way of keeping scientific reading organized. Here is my Connotea library.
When I first started downloading pdf files of interesting papers (some years ago) I used to put them neatly into folders organized by subject. Then, when the google desktop search started indexing PDFs I started just putting everything in one folder and I search for it when I want it back. Both ways work ok but the second ends up being faster.
So why should I use a web based reference manager to keep track of the papers I am interested in ? For one, because it takes almost to time at all. This was one of the nicest things about it. Just highlight the papers' DOI with the mouse and click a bookmarklet. Put in a couple of tags to describe the paper and it's done.
One other advantage of using this is the possibility of sharing the load of finding interesting papers with other people in the site with guilt by association.

I would like to see two tools added to Connotea, one is label clusters, like you see in Flickr and the other would be a graph of related papers or authors, like you can see when you click a news in the CNET news site.

In general I think that the tag/label concept is presently one of the best user driven modes of organizing knowledge. It takes the individual very little time to help out and the outcome is a vast amount of organized information. It is also probably a standard by now and this means that a lot of tools will be built to take advantage of this. Right now the tagging efforts are behind walls but there is no reason not to fuse "tag space" among different domains. Instead of an RSS aggregator we could have tag readers across different services. There is already a nice "tag reader" for del.icio.us called direc.tor.
Another useful tool would be a program to automatically label a document according to my labeling practices (or to someone else's habits). The program could scan through all the stuff I had labeled in the past and learn how to label or at least suggest labels for this new document. It could therefore also label whatever is in my computer. It would be close to indexing but more personalized :).

Further reading on the subject? Start here.

Monday, October 03, 2005

Recent reads

I am doing some boring repetitive jobs that take some time to run (I am so glad to have a cluster to work with) and in the middle of the job runs I took some time to catch up to some paper reading. So here is some of the interesting stuff:

Number one goes to a provocative review/opinion from Fox Keller E. called "Revisiting 'scale-free' networks." There is a comment about it in Faculty 1000. The author talks about power law distributions in an historical perspective removing some of the exaggerated hype and maybe overly optimistic notion that the observations about scale free networks might contain some sort of "universal" truth about complex networks.

I talked before about the work of Rama Ranganathan when I went to a FEBS course on modular protein domains. I said that he had talked about PDZ domains but it was actually WW domains :). Anyway , what he talked about in the meeting was published in two papers in Nature. They are worth a look, specially as a good example of the combination of computational and experimental work. This work exemplifies what I consider a nice role for computational biology, to guide the experimental work. They suggest what are the necessary constraints for a protein fold and then they built them to test their folding and activity experimentally.

Small is beautiful ? I am interested in protein network evolution and this small report by Naama Barkai's group caught my eye. It is a very simple work, they show an example where a cis regulatory motif sequence was dropped during evolution in the Saccharomyces lineage in several genes. I usually like small interesting ideas demonstrated nicely but I dare to say that maybe this one is slightly to simple :).

There is also a paper that I disliked. The paper talks about "The binding properties ad evolution of homodimers in protein-protein interaction networks" but most the conclusions look obvious or misleading. They say for example that a protein that has self interactions has higher average number of neighbors than a random protein. The comparison is not fair because a protein that has self interactions, in their analysis, has two or more interactions (including the self interaction) and a random protein has one or more interactions. The fair comparison would be to compare homodimers with proteins in the network that have at least two interactions.