Saturday, June 28, 2008

Capturing biology one model at a time

Mathematical and computational modeling is (I hope) a well accepted requirement in biology. These tools allow us to formalize and study systems of higher complexity that are hard to conceptualize with logic thinking. There have been great advances in our capacity to model different biological systems, from single components to cellular functions and tissues. Many of these efforts have been ongoing separately, each one dealing with a particular layer of abstraction (atoms, interactions, cells, etc) and some of them are now reaching a level of accuracy that rivals some experimental methods. I will try to summarize, in a series of blog posts, the main advances behind some of these models and examples of integration between them with particular emphasis on proteins and cellular networks. I invite others to post about models in their areas of interest to be collected for a review.

From sequence to fold
RNA and proteins once produced adopt structures that have different functional roles. In principle all information required to determine the structure is in the DNA sequence that encodes for the RNA/protein. Although there has been some success in the prediction of RNA structure from sequence ab-initio protein folding remains a difficult challenge (see review by R.Das and D.Baker). A more pragmatic approach has been to use the increasing structural and sequence data made available in public databases to develop sequence based models for protein domains. In this way, for well studied protein folds it is possible to ask the reverse question, what sequences are likely to fold this way.
(To be expanded in a future post, volunteers welcome)

Protein binding models

I am particularly interested in how proteins interact with other components (mainly other proteins and DNA) and in trying to model these interactions from sequence to function. I will leave protein-compound interactions and metabolic networks for more knowledge people.
As mentioned above even without a complete ab-initio folding model, it is possible to predict for some sequences what is their structure or determine to what protein/domain family the sequence belongs from comparative genomics analysis. This by itself might not be very informative from a cellular perspective. We need to know how cellular components interact and hwo these interconnected components create useful functions in a cell.

Trying to understand and predict how two proteins interact in a complex has been the challenge of structural computational biology for more than two decades . The initial attempt to understand protein-interaction from computational analysis of structural data (what is known today as docking) was published by Wodak and Janin in 1978. In this seminal study, the authors established a computational procedure to reconstitute a protein complex from simplified models of the two interacting proteins. In the twenty-years that have followed the complexity and accuracy of docking methods has steadily increased but still faces difficult hurdles (see reviews Bonvin et al. 2006, Gray, 2006). Docking methods start from the knowledge that two proteins interact and aim at predicting the most likely binding interfaces and conformation of these proteins in a 3D model of the complex. Ultimately, docking approaches might one day also predict new interactions for a protein by exhaustively docking all other proteins in the proteome of the species, but at the moment this is still not feasible.

Interaction types
It should still be possible to use the 3D structures of protein complexes to understand at least particular interactions types. In a recent study, Russel and Aloy have shown that it is possible to transfer structural information on protein-protein interactions by homology to other proteins with identical sequences (Aloy and Russell 2002). In this approach the homologous proteins are aligned to the sequences of the proteins in the 3D complex structure. Mutations in the homologous sequences are evaluated with an empirical potential to determine the likelihood of binding. A similar approach was described soon after by Lu and colleagues and both have been applied on large scale genomic studies (Aloy and Russell 2003 ; Lu et al. 2003). As any other functional annotation by homology this method is limited by how much the target proteins have diverged from the templates. Alloy and Rusell estimated that interaction modeling is reliable above 30% sequence identity (Aloy et al. 2003). Substitutions can also be evaluated with more sophisticated energy potentials after an homology model of the interface under study is created. Examples of tools that can be used to evaluate the impact of mutations on binding propensity include Rosetta and FoldX.
Althougt the methods described above were mostly developed for domain-domain protein interactions similar aproaches have been developed for protein-peptide interactions (see for example McLaughlin et al. 2006) and protein-DNA interactions (see for example Kaplan et al. 2005) .

In summary the accumulation of protein-protein and protein-DNA interaction information along with structures of complexes and the ever increase coverage of sequence space allow us to develop models that describe binding for some domain families. In a future blog post I will try to review the different domain families that are well covered by these binding models.

Previous mini-reviews
Protein sequence evolution

Thursday, June 12, 2008


(caution, fiction ahead)

I wake up in the middle of the night startled by some noise. Pulse racing I try to focus my attention outwards. Something breaking, glass shattering? Is someone out there ? I reach out with my senses but an awkward feeling nags at me, bubbling up to my consciousness. I try hard to focus, it is coming from outside the room , someone is inside my house. I close my eyes but vertigo takes over and weightlessness empowers me. I am in the living room cleaning the floor, picking up a broken glass. The nagging feeling finally assaults me fully. I am moving but I am not in control. Panic rises quickly as I watch helpless the simple and quiet actions of someone else. I stop picking up glass and I feel curious, only it is not exactly me, the feeling is there besides me.
- Hi, who are you ?
The voice catches me by surprise and my fear goes beyond rational control. All I can think of is to escape. to go away from here. For a second time I find myself floating as if searching for a way out. When I open my eyes again I am by the beach and I breath a sigh of relief. The constant sound of the waves calms me down for a few seconds until my eyes start drifting to the side. No, stay there I am in control! I look into the eyes of a total stranger that smiles back at me in recognition. Two voices ask me if I am enjoying the view and I can only scream back in confusion.

I wake up in the middle of the night startled by some noise. I immediately flex my hands in front of my eyes to make sure it was nothing but a nightmare trying hard to calm down. What a dream. I get up and check on the noise coming from the living room realizing that it was just the storm outside. Feeling better I fire up my laptop and grab a glass of water from the kitchen. I open twitter and type away:
- I had the strangest dream !(cursor blinking) Our senses were all connected(enter)
I get up to open the window drinking another sip of water. After a couple of steps I feel a jabbing headache forcing me to stop and bright spots of light blur my vision. I close my eyes in pain and the voices of some unseen crowd thunder in my ears:
- I had the same dream - the all say in unison
The sound of glass shattering on the floor in the last thing I remember before collapsing.

I wake up in the middle of the night startled by some noise (...)

(Twistori was the main motivation for this post)

Previous fiction:
The Fortune Cookie Genome

Tuesday, June 10, 2008

Why does FriendFeed work ?

I have been using FriendFeed for a while and I have to say that it works surprisingly well. It is hard to define what FriendFeed is so the only real way of understanding it is to try it for a while.

One common way to define FF would be as a life-stream aggregator. Each user defines a set of feeds (blog, Flickr, Twitter, bookmarks, comments, etc) providing all other users with a single view of all the online activities of that user. Anyone can select how much to share (even nothing at all) and subscribe to a number of other users. Each item (photo, blog post, bookmark) can serve then as spark for discussions. The users can mark items as interesting or comment on them and this propagates to all other people that subscribe to you. In addition we can select sources to hide if for some reason there is a particular part of a user's activities you don't enjoy. All of this creates a very personalized view of whoever you elect to interact with online.

I still find it striking that there are so many long threads of discussions around items that we share in FriendFeed, sometimes more than in the original site. A couple of examples:
Google code as a science repository (discussion in FF, blog post)
Into the Wonderful (discussion in FF, slideshare site)
Bursty work (discussion in FF, blog post)

Why does it work so well ? One possible reason could be that a group of early adopter scientists happened to get together around this website creating the required critical mass to start the discussions. Still, most of those commenting were already participating on blogs so that might not be it. There might be something about the interface, maybe it is the ease of adding comments and that these comments can be edited that increases the participation. Ongoing discussions get bumped higher in the view so every new comment brings the item back to your attention. In this way you know who saw the item and who is thinking about it. A bit like talking about a movie you saw or a book you read with a bunch of friends.

Anyone interested in the science aspects of it should check out the Life Scientists room with currently around 85 subscribers. Here is an introduction to some of these people, in particular on what they work on. Connecting to other scientists in this way lets you see what are the articles they find interesting and discuss current scientific news. Even maybe start a couple of side-projects for the fun of it.

Monday, June 09, 2008

Evaluation metrics and Pubmed Faceoff

I have been reading recently a lot about evaluation metrics for papers and authors. It started with a blog post in Action Potential (Nature Neuroscience's blog) showing a correlation between the number of downloads of a paper and its citations. From the comments in that blog post I found out about a forum in Nature Network about Citation in Science and also the recently published group of perspectives on "The use and misuse of bibliometric indices in evaluating scholarly performance".

It could have been a coincidence but Pierre sparked a long discussion in FriendFeed when he suggested it would be nice to be able to sort Pubmed queries by the imapact factor of the journal. In reaction to this Euan set up a very creative interface to Pubmed that he named Pubmed Faceoff. He took several different factors into account (citations from Scopus, eigenfactor of the journal, the time the paper was published) and for each paper returned from a Pubmed query creates a face that describes the paper. The idea for the visualization is based on Chernoff Faces. It is really a creative idea and I wish Pubmed could spend more resources in coming up with alternative interfaces like this, something like a "labs" section where they could play with ideas or allow others to create interfaces that they would host.

I wont go here into the whole debate about the evaluation metrics since there is already a lot of discussion going on in some of those links I mentioned.