Cellular Consequences of Genetic variation: structures

Showing posts with label structures. Show all posts

Thursday, March 03, 2011

Structure based prediction of kinase interactions

About a year ago Ben Turk's lab published a large scale experimental effort to determine the substrate recognition preferences of most yeast kinases (Mok et al. Sci. Signal. 2010). They used a peptide screening approach to analyze 61 of about 122 known S. cerevisiae kinases in order to derive, for each one, a position specific scoring matrix (PSSM) describing their substrate recognition preference. In the figure below I show an example for the Hog1 MAPK where it is clear that this kinase prefers to phosphorylate peptides that have proline next to the S/T that is going to be phosphorylated.

Figure 1 - Example of Hog1 substrate recognition preference derive from peptide screens. Each spot in the array contains a mixture of peptides that are randomized at all positions except at marked position (-5 to +4 relative to the phosphorylatable residue). Strong signal correlates with a preference for phosphorylating peptides containing that amino-acid at the fixed position.

As was previously known, most kinases don't appear to have very striking substrate binding preferences. Still, these matrices should allow for significant predictions of kinase-site interactions. These matrices should allow us also to benchmark previous efforts by Neil and other members of the Kobe lab on the structural based predictions of kinase substrate recognition. For this, I obtained the predicted substrate recognition matrices from the Predikin server and known kinase-site interactions from the PhosphoGrid database. I used this data to compare the predictive power of the experimentally determined kinase matrices (Mok et al.) with the predicted matrices from Predikin. This analysis was done about a year ago when the Mok et al. paper was published but I don't think Phosphogrid was significantly updated since then.

Phosphogrid had 422 kinase-site interactions for the 61 kinases analyzed in Mok et al. of which ~50% of these have in-vivo evidence for kinase recognition. As expected, the known kinase-site interactions have a stronger experimental matrix score than random kinase-site assignments (Fig 2).

Figure 2 - The set of kinase-site interactions used broken down according the kinases with higher representation. These sites were scored using the experimental matrices along with other randomly selected phosphosites and the scores of both populations are summarized in the boxplots.

A random set of kinase-phosphosite interactions of equal size was used to quantify the predictive power of the experimental and the Predikin matrices with a ROC curve (Fig 3).

Figure 3 - Area under the ROC curve values for kinase-site predictions using both types of matrices.

Overall, the accuracy of the predicted matrices from Predikin matched reasonably well with those derived from the peptide array experiments with only a small difference in AROC values. I broke down the predictions for individual kinases with at least 10 sites known. Benchmarking of such low numbers becomes very unreliable but besides the Cka1 kinase, the performance of the Predikin matrices matched reasonably well the experimental results.

I am assuming here that Predikin was not updated with any information from the Mok et al study to derive their predictions. If this is true it would mean that structural based prediction of kinase recognition preferences, as implemented in Predikin, is almost as accurate as preferences derived from peptide library approaches.

Tuesday, April 08, 2008

Structure based prediction of SH2 targets

One of the last few things I worked on during the PhD is now available in PLoS Comp Bio. It is about the structure based prediction of binding of SH2 domains to phospho-peptide targets.

The SH2 domain (src homology domain 2) is a small domain of around 100 amino-acid that has a strong preference to bind peptides that have phosphorylated tyrosines. The selectivity of each domain is typically further restricted by variable surfaces near the phospho-tyrosine binding pocket. See figure below:

The binding preference of each domain can be experimentally determined using for example peptide library screening, phage display or protein arrays. Alternatively we should be able to analyze the increasing amount of structural information and predict the binding specificity of peptide binding domains.
We tried to show here that given a structure of an SH2 domain in complex with a peptide it is possible to predict the binding specificity of this domain. It is also possible, to some extent, predict how mutations on these domains might affect their binding preferences. Finally, combining predictions of specificity with known human phospho sites allows for very reasonable predictions of in vivo SH2-target interactions.

The obvious limitation here is that we need to start with structure of the domain we know from some unpublished work that for families with good structural coverage, homology models can produce specificity predictions that as accurate as from x-ray structure. The other limitation is that giving the lack of dynamics a single conformation of the interactions is modeled and this should in part help determine the binding specificity. One possible to this problem that we have used with some success is to model different peptide conformation for each binding domain.

I should make clear that although I think there is an improvement over previous works there is already a considerable amount of research on this topic that we tried to cite in the introduction and discussion. I would say that some of the best previous work on structure based predictions of domain-peptide interactions has come from Wei Wang lab (see for example McLaughlin et al. or Hou et al.)

This manuscript was the first (and only so far) I collaborated on with Google Docs. It worked well and I recommend it to anyone that needs to co-write a manuscript with other people. It saves a lot of emails and annotations on top of annotations.

Saturday, September 08, 2007

The Biology of Modular Protein Domains

From tomorrow on I will be in Austria for a small conference on the biology of protein domains. I might post some short notes about the meeting in the next few days. I'll get a chance to present some of the things I have been working on about the prediction of domain-peptide interactions from structural data.

Here is one of these modular protein domains, an SH3 domain, in complex with a peptide:

The very short summary of it is that it is possible to take the structure of one of these domains in complex with a peptide (ex: SH3, phospho binding domains, kinases, etc) and predict their binding specificity. To some extent it is also possible to take a sequence, obtain a model (depends on structural coverage) and determine its specificity. I'll talk more about the details (hopefully) soon.

Thursday, June 21, 2007

Structures in Systems Biology (a double bill)

Once in a while I get to write about what I have been working on. The last time it was about the evolution of protein interaction networks. This time it is about two papers that I contributed too. A review about the use of structures in systems biology and an article about structure based prediction of Ras/RBD interactions. I am sorry to say that both require a subscription (pedrobeltrao *at* gmail).

Main conclusions
Structural data can be used to predict Ras/RBD interactions with approximately 80% accuracy
We can and should use structural information to understand the main molecular properties before abstracting away the atomic details. Structural genomics can serve as a bridge between the abstract network view and the atomic detail.

The Making off
Although I am not the first author of the article I think it is safe to say that the main inspiration for the line of work done by Kiel (see also previous publication) is the work by Aloy and Russell where they first showed that it was possible to use a protein complex to predict if similar proteins would be able to interact in a similar way. What Kiel showed is that more accurate predictions can be made by modeling the protein domains under test onto the complex and evaluating the binding energy using a protein design program under development in the lab (FoldX). She used pull-down experiments and available information on Ras/RBD interactions to benchmark the predictions.

The predicted binding energies inform us about the probability that the two protein domains would bind in vitro. Inside the cell there are many other factors contributing to the likelihood of binding (gene expression, localization, complex formation, post-translational modifications, etc). To try to add some of this knowledge to the predictions I contributed with a Naive Bayes predictor that combines information on gene expression, GO functions, conserved physical/genetic interactions in other species and shared binding partners. The likelihood score obtained can be used to further rank the predicted interactions according to the likelihood that these are occurring inside the cell. In supplementary information there are the methods and tables with individual likelihood scores that can be used to reproduce the Naive Bayes predictor.

From atoms to nodes and edges
I think one of the main goals of the the review was to show the current progress that has been made in using structural information to obtain the fundamental properties (binding sites, catalytic sites, protein dynamics, etc) of cellular components that may allow us to create models of cellular functions. There has been some work in approximating the very abstract "nodes and edges" view of cellular interactions to a more traditional pathway model. This has been done typically by searching for modules and particular node roles that depend on the patterns of intra or inter module interactions (see Guimera et al). We should be able to automatically decorate interaction networks (and the pathway modules) with structural data that can further help to computationally generate meaningful models of cellular functions.

The picture was obtained from Beltrao et al , it is Copyright © 2007 Elsevier Ltd and it used here hopefully under fair use.

In the pipeline
There are several important details to iron out before we can just apply this structure based prediction of protein interactions to any protein that we can model onto complexes. We are in the process of testing the approach with other different domain types. Some of if I have been more directly involved and we started now the submission process. I tried to get everyone to agree to submit it to a preprint server but not everyone was comfortable with the idea.