Cellular Consequences of Genetic variation: evolution

Showing posts with label evolution. Show all posts

Thursday, July 19, 2012

Evolution and Function of Post-translational Modifications

A significant portion of my postodoctoral work is finally out in the last issue of Cell (link to paper). In this study we have tried to assign a function to post-translational modifications (PTMs) that are derived from mass-spectrometry (MS). This follows directly from previous work where we looked at the evolution of phosphorylation in three fungal species (paper, blog post). We (and other groups) have seen that phosphorylation sites diverge rapidly but we don't really know if this divergence of phosphosites results in meaningful functional consequences. In order to address this we need to know the function of post-translational modifications (if they have any). Since these MS studies now routinely report several thousand PTMs per analysis we have a severe bottleneck in the functional analysis of PTMs. These issues are the motivations for this last work. We collected previously published PTMs (close to 200.000) and obtained some novel ubiquitylation sites for S. cerevisiae (in collaboration with Judit Villen's lab). We revisited the evolutionary analysis and we set up a couple of methods to prioritize those modifications that we think are more likely to be functionally important.

As an example, we have tried to assign function to PTMs by annotation those that likely occur at interface residues. One approach that turned out to be useful was to look for conservation of the modification sites within PFAM domain families. For example, in the figure above and under "Regulation of domain activity", I am depicting a kinase domain. Over 50% of the phosphorylation sites that we find in the kinase domain family occur in the well known activation loop (arrow), suggestion that this is an important regulatory region. We already know that the activation loop is an important regulatory region but we think that this conservation approach will be useful to study the regulation of many other domains. In the article we give other examples and an experimental validation using the HSP70 domain family (in collaboration with the Frydman lab).

I won't describe in detail the work as you can (hopefully) read the paper. Leave a comment or send me an email if you can't and/or if you have any questions regarding the paper or analysis. I also put up the predictions in a database (PTMfunc) for those who want to look at specific proteins. It is still very alpha, I apologize for the bugs and I will try to improve it as quickly as possible. If you want access to the underlying data just ask and I'll send the files. I am also very keen on collaborations with anyone collecting MS data or interested in the post-translational regulation of specific proteins, complexes or domain families.

Blogging and open science
Having a blog means I can give you also some of the thoughts that don't fit in a paper or press release. You can stop reading if you came for the sciency bits. One of the cool things I realized was that I have discussed in this blog three papers in the same research line, that run through my PhD and postdoc. It is fun to be able to go back not just to the papers but to the way I was thinking about these ideas at the time. Unfortunately, although I try to use this blog to promote open science this project was yet-another-failed open science project. Failed in the sense that it started with a blog post and a lot of ambition but never gained any momentum as an online collaboration. Eventually I stopped trying to push it online and as experimental collaborators joined the project I gave up on the open science side of it. I guess I will keep trying whenever if makes sense. This post closes project 1 (P1) but if you are interested in online collaborations have a look at project 2 (P2).

Publishable units and postdoc blues
This work took most of my attention during the past two years and it is probably the longest project I have worked on. Two years is not particularly long but it has certainly made me think about what is an acceptable publishable unit. As I described in the last blog post, this concept is very hard to define. While we probably all agree that a factoid in a tweet is not something I should put on my CV we allow and even cheer for publishing outlets that accept very incremental papers. The work I described above could have easily been sliced into smaller chunks but would it have the same value ? We would have put out the main ideas much faster but it could have been impossible to convince someone to test them. I feel that the combination of the different analysis and experiments has more value as a single story but an incremental approach would have been more transparent. Maybe the ideal situation would be to have the increments online in blogs, wikis and repositories and collect them in stories for publication. Maybe, just maybe, these thoughts are the consequence of postdoc blues. As I was trying to finish and publish this project I was also jumping through the academic track hoops but I will leave that for a separate post.

Sunday, January 03, 2010

Stitching different web tools to organize a project

A little over a year ago I mentioned a project I was working on about prediction and evolution of E3 ligase targets (aka P1). As I said back then, I am free to risk as much as I want in sharing ongoing results and Nir London just asked me how the project is going via the comments of that blog post so I decided to give a bit of an update.

Essentially, the project quickly deviated from course since I realized that predicting E3 specificity and experimentally determining ubiquitylation sites in fungal species (without having to resort to strain manipulation) were not going to be an easy tasks.
So, since the goal was to use these data to study the co-evolution of phosphorylation switches (phosphorylation regulating ubiquitylation) it makes little sense to restrain the analysis specifically to one form of post-translational modification (PTM). After a failed attempt to purify ubiquitylated substrates the goal has been to come up with ways to predict the functional consequences of phosphorylation. We will still need to take ubiquitylation into account but that will be a part of the whole picture.

With this goal in mind we have been collecting for multiple species data on phosphorylation as well as other forms of PTMs from databases and the literature and we have been trying to come up with ways to predict the function of these phosphorylation events. These predictions can be broken down mostly intro tree types:
- phosphorylation regulating domain activity
- phosphorylation regulating domain-domain interactions (globular domain interfaces)
- phosphorylation regulating linear motif interactions (phosphorylation switches in disordered regions)

We have set up a notebook where we will be putting some of the results and ways to access the datasets. Any new experimental data and results from the analysis will be posted with a significant delay both to give us some protection against scooping and also to try to guarantee that we don't push out things that are obviously wrong. This brings us to a disclaimer... all data and analysis in that notebook is to be considered preliminary and not peer reviewed, it probably contains mistakes and can change quickly.

I am currently colaborating with Raik Gruenberg on this project and we are open to collaborators that bring new skills to the project. We are particularly interested in experimentalist working in cell biology and cell signalling that could be interested in testing some of the predictions we are getting out of this study.

I won't talk much (yet) about the results we have so far but instead mention some of the tools we are using or planning to use:
- The notebook of the project hosted in openwetware
- The datasets/files are shared via Dropbox
- If need arises code will be shared via Google Code (currently empty)
- Literature will be shared via a Zotero group library
- The papers and other items can be discussed in a Friendfeed group

This will be all for now. I think we are getting interesting results from this analysis on the evolution of the functional consequences of phosphorylation events but we will update the notebook when we are a bit more confident that we ruled out most of the potential artifacts. I think the hardest part about exposing ongoing projects is having to explain to potential collaborators that we intend to do so. This still scares people away.

I'll end with a pretty picture. This is an image of an homology model for the Tup1 -Hhf1 interaction. Highlighted are two residues that are predicted by the model to be in the interface and are phosphorylated in two different fungal species. This exemplifies how the functional consequence of a phosphorylation event can be conserved although the individual phosphorylation sites (apparently) are not.

Tuesday, August 11, 2009

Translationally optimal codons do not appear to significantly associate with phosphorylation sites

I recently read an interesting paper about codon bias at structurally important sites that sent me on a small detour from my usual activities. Tong Zhou, Mason Weems and Claus Wilke, described how translationally optimal codons are associated with structurally important sites in proteins, such as the protein core (Zhou et al. MBE 2009). This work is a continuation of the work from this same lab on what constraints protein evolution. I have written here before a short review of the literature on the subject. As a reminder, it was observed that the expression level is the strongest constraint on a protein's rate of change with highly expressed genes coding for proteins that diverge slower than lowly expressed ones (Drummond et al. MBE 2006). It is currently believed that selection against translation errors is the main driving force restricting this rate of change (Drummond et al. PNAS 2005,Drummond et al. Cell 2008). It has been previously shown that translation rates are introduced, on average, at an order of about 1 to 5 per 10000 codons and that different codons can differ in their error rates by 4 to 9 fold, influenced by translational properties like the availability of their tRNAs (Kramer et al. RNA 2007).

Given this background of information what Zhou and colleagues set out to do, was test if codons that are associated with highly expressed genes tend to be over-represented at structurally important sites. The idea being that such codons, defined as "optimal codons" are less error prone and therefore should be avoided at positions that, when miss-translated, could destabilize proteins. In this work they defined a measure of codon optimality as the odds ratio of codon usage between highly and lowly expressed genes. Without going into many details they showed, in different ways and for different species, that indeed, codon optimality is correlated with the odds of being at a structurally important site.

I decided to test if I could also see a significant association between codon optimality and sites of post-translational modifications. I defined a window of plus or minus 2 amino-acids surrounding a phosphorylation site (of S. cerevisiae) as associated with post-translational modification. The rationale would be that selection for translational robustness could constraint codon usage near a phosphorylation site when compared with other Serine or Threonine sites. For simplification I mostly ignored tyrosine phosphorylation that in S. cerevisiae is a very small fraction of the total phosphorylation observed to date .
For each codon I calculated its over representation at these phosphorylation windows compared to similar windows around all other S/T sites and plotted this value against the log of the codon optimality score calculated by Zhou and colleagues.

Figure 1 - Over-representation of optimal codons at phosphosites

At first impression it would appear that there is a significant correlation between codon optimality and phosphorylation sites. However, as I will try to describe below this is mostly due to differences in gene expression. Given the relatively small number of phosphorylation sites per protein, it is hard to test this association for each protein independently as it was done by Zhou and colleagues for the structurally important sites. The alternative is therefore to try to take into account the differences in gene expression. I first checked if phosphorylated proteins tend to be coded by highly expressed genes.

Figure 2 - Distribution of gene expression of phosphorylated proteins

I figure 2 I plot the distribution of gene expression for phosphorylated and non-phosphorylated proteins. There is only a very small difference observed with phosphoproteins having a marginally higher median gene expression when compared to other proteins. However this difference is small and a KS test does not rule out that they are drawn from the same distribution.

The next possible expression related explanation for the observed correlation would be that highly expressed genes tend to have more phosphorylation sites. Although there is no significant correlation between the gene expression level and the absolute number of phosphorylation sites, what I observed was that highly expressed proteins tend to be smaller in size. This means that there is a significant positive correlation between the fraction of phosphorylated Serine and Threonine sites and gene expression.

Figure 3 - Expression level correlates with fraction of phosphorylated ST sites

Unfortunately, I believe this correlation explains the result observed in figure 1. In order to properly control for this observation I calculated the correlation observed in figure 1 randomizing the phosphorylation sites within each phosphoprotein. To compare I also randomized the phosphorylation sites keeping the total number of phosphorylation sites fixed but not restricting the number of phosphorylation sites within each specific phosphoprotein.

Figure 4 - Distribution of R-squared for randomized phosphorylation sites

When randomizing the phosphorylation sites within each phosphoprotein, keeping the number of phosphorylation sites in each specific phosphoproteins constant the average R-squared is higher than the observed with the experimentally determined phosphorylation sites (pink curve). This would mean that the correlation observed in figure 1 is not due to functional constraints acting on the phosphorylation sites but instead is probably due to the correlation observed in figure 3 between the expression level and the fraction of phosphorylated S/T residues.
The observed correlation would appear to be significantly higher than random if we allow the random phosphorylation sites to be drawn from any phosphoprotein without constraining the number of phosphorylation sites in each specific protein (blue curve). I added this because I thought it was an striking example of how a relatively subtle change in assumptions can change the significance of a score.

I also tested if conserved phosphorylation sites tend to be coded by optimal codons when compared with non-conserved phosphorylation sites. For each phosphorylation site I summed over the codon optimality in a window around the site and compared the distribution of this sum for phosphorylation sites that are conserved in zero, one or more than one species. The conservation was defined based on an alignment window of +/- 10AAs of S. cerevisiae proteins against orthologs in C. albicans, S. pombe, D. melanogaster and H. sapiens.

Figure 5 - Distribution of codon optimality scores versus phospho-site conservation

I observe a higher sum of codon optimality for conserved phosphorylation sites (fig 5A) but this difference is not maintained if the codon optimality score of each peptide is normalized by the expression level of the source protein (fig 5B).

In summary, when the gene expression levels are taken into account, it does not appear to be an association between translationally optimal codons with the region around phosphorylation sites. This is consistent with the weak functional constraints observed by in analysis performed by Landry and colleagues.

Tuesday, June 23, 2009

Comparative analysis of phosphoproteins in yeast species

My first postdoctoral project has just appeared online in PLoS Biology. It is about the evolution of phosphoregulation in yeast species. This analysis follows from a previous work I had done during my PhD on the evolution of protein-protein interactions after gene duplication (paper / blog post). One of the conclusions from that previous work was that interactions of lower specificity, such as those mediated by short peptides, would be more prone to change. In fact, one of the protein domains that we found associated with high rates of change of protein-protein interactions was the kinase domain.
Given that the substrate specificity of a kinase is usually determined by a few key amino-acids surrounding the target phosphosite it is easy to image how kinase-substrate interactions can be easily created and destroyed with few mutations. It is also well known that these phosphorylation events can have important functional consequences. We therefore postulated that changes in phosphorylation are an important source of phenotypic diversity.

To test this, we collected by mass-spectrometry in vivo phosphorylation sites for 3 yeast species (S. cerevisiae, C. albicans and S. pombe). These were compared in order to estimate the rate of change of kinase-substrate interactions. Since changes in gene expression are generally regarded as one of the main sources of phenotypic diversity we compared these estimates with similar calculations for the rate of change of transcription factor (TF) interactions to promoters. Depending on how we define a divergence of phosphorylation we estimate that kinase-substrate interactions change either at similar rates or at most 2 orders of magnitude slower than TF-promoter interactions.

Although these changes in kinase-substrate interactions appear to be fast, groups of functionally related proteins tend to maintain the same levels of phosphorylation across broad time scales. We could identify a few functional groups and protein complexes with a significant divergence in phosphorylation and we tried to predict the most likely kinases responsible for these changes.

Finally we compiled recently published genetic interaction data for S. pombe (from Assen Roguev's work) and for S. cerevisiae (from Dorothea Fiedler's work) in addition to some novel genetic data produced for this work. We used this information to study the relative conservation of genetic interactions for protein kinases and transcription factors. We observed that both proteins kinases and TFs show a lower than average conservation of genetic interactions.

We think these observations strongly support the initial hypothesis that divergence in kinase-substrate interactions contributes significantly to phenotypic diversity.

Technology opening doors
For me personally it really feels like I was in the right place at the right time. Many of the experimental methods we used are still under heavy development but I was lucky to be very literally next door to the right people. I had the chance to collaborate with Jonathan Trinidad who works for the UCSF Mass Spectrometry Facility directed by Alma Burlingame. I also arrived at a time when the Krogan lab, more specifically Assen Roguev (twitter feed), has been working to develop genetic interaction assays for S. pombe (Roguev A 2007). As we describe in the introduction, these technological developments really allow us to map out the functional and physical interactions of a cell at an incredible rate. What I am hoping for is that soon they are seen in much the same light as genome sequencing. We can and should be using these tools to study, simultaneously, groups of species and not just the same usual model organisms that diverged from each other more than 1 billion years ago.

Evolution of signalling
There are many more protein interactions that are determined by short linear peptide motifs (Neduva PLoS Bio 2005). A large fraction of these determine protein post-translational modifications and are crucial for signal transduction systems. For the next couple of years I will try to continue to study the evolution of signal transduction systems. There are certainly many experimental and computational challenges to address. I am particularly interested in looking at the co-regulation by combinations of post-translational modifications and their co-evolution. I will do my best to share some of that work as it happens here in the blog.

Wednesday, April 16, 2008

The shuffle project

Most of my work in the last few years was computational, either looking at the evolution of protein-protein interactions or at the prediction of domain-peptide interactions. The nice thing of working on a lab were a lot of people were doing wet lab experiments was that I had the oportunity to, once in a while, grab some pipettes and participate in some of the work that was going on. One project that worked out well was published today (not open access sorry). My contribution to this project was small but it was a lot of fun and I am very interested in the topic that we worked on. We called it the shuffle project in lab.

The main objective of this work was to study how the addition of gene regulatory interactions impacts on a cell's fitness. We introduced different combinations of existing E.coli promoters and transcription/sigma factors either as plasmids or integrated in the genome. In effect, each construct mimics a duplication of one of the E.coli's sigma factors or transcription factors with a change in its promoter. We then tested the impact on fitness by measuring growth curves under different conditions or performing competition assays.

There were a couple of interesting findings but the two the I found most interesting were:
- The vast majority of the constructs had no measurable impact on growth even by testing different experimental conditions.
- A few constructs could out-compete the control in competition assays (stationary phase survival or passaging experiments in rich medium).

Both of these suggest that the gene regulatory network of E. coli is very tolerant to the addition of novel regulatory interactions. This is important because it tells us that regulatory networks are free to explore new interactions given that there is a limited impact on fitness. From this we could also argue that if there are many equivalent (nearly neutral) ways of regulating gene expression we can't expect to see individual gene regulatory interactions conserved across different species. There are a several recent studies, particularly in eukaryotic species, showing that there is in fact a fast divergence of transcription factor binding sites (see recent review by Brian B. Tuch and colleagues) and many other examples that show that although the selectable phenotype is found to be conserved the underlying interactions or regulations have diverged in different species. (see Tsong et al. and Lars Juhl Jensen et al.)

There are a couple of questions that come from these and other related works. What is the fractions of cellular interactions that are simple biologically irrelevant ? Is it possible to predict to what degree purifying selection restricts changes at different levels of cellular organization ? What is the extent of change in protein-protein interactions ?

Having previously worked on the evolution of protein-protein interactions this is the direction that most interests me. This is why I am currently looking at the evolution of phospho-regulation and signaling in eukaryotic species.

Sunday, March 02, 2008

Design, mutate and freze

Drew Endy talked about engineering biology for Edge. Most of the emphasis is still on standardization of biological parts and the importance of simplifying the process of creating a biological function. Still it would be nice to hear from him some new ideas about establishing processes of engineering biology. His whole speech seems focused on creating the hacker culture in biology. To transpose all the same concepts that would allow us to re-create the explosive growth of tinkering and production that we saw for electronics and computer programing within the biological sciences.

I agree with most of what he says, that we should: 1)focus on method development; 2)work on a registry of parts and 3) foster an "open source"/hacker culture in synthetic biology. In this text he did not mention for example the importance of modeling but it is implicit in the standardization of parts. Once you have a computer simulation of the process you wish to engineer that you should be able to reach into the parts list to implement it. The problem with this concept of standardized parts is the complexity that Drew Endy dislikes so much. There is still no way around it. We can take a part that has been very well defined in E. coli, plug into a yeast plasmid and it might not work at all.

If we are still far way from the ideal plug and play maybe we could try to take advantage of what biology can do very well, to evolve to a suitable solution. I would argue that we should develop engineering protocols that could take advantage of the evolutionary process.

<insert rambling>
Lets say we want to implement a function and I know beforehand that I will not be able to get perfect parts to implement it. Can we design this function in a way that it will have a large funnel of attraction for the design properties that I am interested in ? Are there biological parts that are more amenable to a directed evolutionary experiment to reach that design goal ? How can I increase the mutation rate for a controlled period of time and only for the stretch of DNA that I want to evolve ? Maybe it is possible to place the parts in a plasmid and have the replication of this plasmid be under a different polymerase that is more error prone ?
</insert rambling>

If we could answer some of these questions (maybe we have already), we could design the function of interest (modeling), pull parts that would be close to the solution, mutate/select until the best design is achieved and then freeze it by reducing the generation of diversity in some way.

Further reading:
Synthetic biology: promises and challenges
Molecular Systems Biology 3 Article number: 158 doi:10.1038/msb4100202

Wednesday, December 05, 2007

Open Science project on domain family expansion

Some domain families of similar function have expanded more than others during evolution. Different domain families might have significantly different constraints imposed by their fold that could explain these differences. This project aims to understand what properties determine these differences focusing in particular on peptide binding domains. Examples of constraints to explore include average cost of production or capacity to generate binding diversity for the domain family.

This project is also a test for using Google Code as a research project management system for open science (see here for project home). Wiki pages will be used to collect previous research and milestone discoveries during the project development and to write the final manuscript towards the end of the project. Issue tracking system can be used to organize the required project tasks and assign them to participants. The file repository can hold the datasets and code used to derive any result.

I plan to use the blog as a notebook for the project (tag: domainevolution) and the project home at Google Code as the repository and organization center. The next few post regarding the project will be dedicated to explain better why I am interested in the question and develop further what are some of my expectations. Anyone interested in contributing is more than welcome to join in along the way. I should say that I am not in any hurry and that this is something for my 20% time ;).

Tuesday, May 15, 2007

Protein evolution

What constrains and determines the rate of protein evolution ? This topic has received a great deal of attention in bioinformatics. Many reports have found significant correlations between protein evolutionary rate and expression levels, codon adaptation index (CAI), protein interactions (see below), protein length, protein dispensability and centrality in protein interactions networks. To complicate matters still, there are known cross correlations between some of the factors. For example it has been observed that the number of protein interactions correlates with protein length (weakly) and the probability that a protein is essential to the cell.

This highlights the importance of thinking about the amount of variance explained by the correlation and controlling for possible cross correlations. In fact it has been shown that, when controlling for gene expression, some of other factors have a weaker correlation (or none at all) with the rate of protein evolution (Csaba Pál et al 2003). Using principal component regression, Drummond and colleagues have shown that a single component dominated by expression, CAI and protein abundance accounted for 43% of the variance of the non-synonymous mutation rate (dN). The other known factors account only for a few percentage of the observed variance in dN.

Two questions might come to mind when thinking about these observations. One is why would expression values, CAI and protein abundance constrain protein evolution. The other is why the number of protein interactions explain so little (or non at all) of the variance in protein evolutionary rates. Intuitively, the number of protein interactions is related to the functional density of a protein and proteins with hight functional density should have a lower dN.

Drummond and colleagues proposed in a PNAS paper an explanation for the first question. They first list three possible reasons for why expression levels should have such a strong effect on protein evolution: functional loss, translational efficiency and translational robustness. Functional loss, postulated by Rocha and Danchin hypothesizes that highly expressed proteins have lower dN because they are under strong selection to minimize the impact of miss-translation that would create a large pool of inefficient proteins and reduce the fitness of the cells. A second hypothesis proposed by Akashi links protein evolutionary rates with gene expression through efficiency of transcription. Highly expressed proteins have optimal codon usage for efficient translation and therefore a lower dN and dS. Drummond and colleagues added a third hypothesis that they called translational robustness. Given the costs of miss-folding and agregation, the higher the number of errors in translation that might lead to miss-folding and agregation the higher the cost for the cell. Therefore there might by a strong selection for keeping highly expressed genes robust against miss-translation.

The difference between translational robustness and functional loss is that the first implies that the number of events of translation are the important factor while the second puts emphasis on the protein concentration. Using protein abundance and mRNA expression the authors showed that translational robustness seams to be the most important factor determining the rate of protein evolution.

In fact, in a recent paper (Tartaglia et al, 2007) a correlation between in vitro aggregation rates and in vivo expression levels was discovered. Highly expressed proteins tend to have a lower agregation rate measured in vitro (r=97, N=12). The number of proteins analyzed was small and the rates of agregation were obtained not always in the same conditions but it does fit with the translational robustness hypothesis.

Even if the number of translational events is such a strong constrain, one would expect that when accounting for this, one would still see an effect of functional density on protein evolution. Yet, the correlation between a proxy for functional density - number of protein interactions - and dN has been under strong debate. (yes there is, no there isn't, yes, no , yes, maybe, ...)

The answer to this dispute might in the end be that the number of protein interactions is not a good proxy for functional density. A protein might have many protein interactions using a single interface. This is why the work of Kim and colleagues from Gerstein lab is important. Using structural information they predicted the most likely interface for protein interactions in S. cerevisiae. They could then show that protein evolutionary rate correlates better with adjusted interface surface area than with number of protein interactions. Also, the relationship of evolutionary rate with protein evolution appears to be independent of protein expression level.

The overall picture so far seems to be that translational robustness is the main driving force shaping protein evolutionary rates. Functional constrains are also important but are much more localized explaining a smaller fraction of the overall variance of the whole proteins.

Where can we go further ? As I mentioned above, translational robustness predicts that expression levels should correlate with overall stability, designability (number of sequences that fit the structure) and avoidance of aggregation prone sequences. Bloom and colleagues have shown that density of inter-residue contacts(a proxy for designability) does not correlate with expression but the study was limited to roughly 200 proteins so this might no be the final answer.

So, a clear hypothesis is that a computational measure that would sum a proteins' stability, tendency for agregation and designability should correlate with gene expression levels.

Further reading:
An integrated view of protein evolution (Nature Reviews Genetics)

Tuesday, April 24, 2007

Cellular adaptation to unforeseeable events

How do cells react to changes in external conditions ? It has been noted before than in many cases the immediate transcriptional response includes unspecific changes in gene expression for a large group of genes (Gash et al, 2000). Fong and colleagues have shown that in E. coli, 20 to 40 days after the initial changes, most of the genes return to expression levels prior to the modifications of the environment. The differentially expressed genes at this stage are situation specific but not necessarily always the same. In this same paper, the gene expression changes were followed for different independent populations evolving under the same changes in conditions. Out of ~1100 gene expression changes (on average) that were possibly adaptive to the new conditions, only 70 were common to all 7 parallel populations.

A new studied published in MSB, adds more information to these interesting findings. In this study the authors tried to challenge S. cerevisiae with a perturbation that these cells should not have seen during their evolutionary history. They used a his3 deletion strain with a plasmid having HIS3 under the GAL1 promoter. In these cells the essential HIS3 gene should be efficiently turned off in a glucose medium. They then tracked the gene expression changes over time when the medium was changed from galactose to glucose. The cells adapted to these conditions within around 10-20 generations. Again the initial gene expression changes involved a large number of genes (~1000-1600 genes> 2 fold change) with most of them (65%-70% ) returning to their original expression levels in 10-20 generations. Again, different populations had different genes differentially expressed in response to the transition from gal to glu.

There is a detailed analysis in the paper regarding the functional classes of the genes but for me these general trends were by themselves very interesting. How does the cell cope with unforeseeable events ?

Maybe there is a general mechanism that senses discrepancies between metabolic requirements and the current cellular state and, in the absence of a programed response, drives an almost chaotic search for plausible solutions ? If there is such a sensing mechanism it could provide the necessary feedback for the selection of cellular states at a physiological time scale. In a environment were frequent unpredictable changes occur such a system could possibly be selected for.

For further reading have a look at the news and views by Eugene Koonin