Cellular Consequences of Genetic variation

Monday, April 10, 2017

17 years of systems biology

I know that 17 years is not a very round number. It is also fairly arbitrary as I am assuming systems biology started around 2000 (see below). I was last week in Portugal, where every year for the past 8 years I have been teaching a week long course on Systems and Synthetic Biology to the GABBA PhD program. This might be the last year I take part in this course and so I felt it would be a good time to try to put some thoughts in a blog post. This course has been jointly co-organised from the beginning with Silvia Santos and we had several guests throughout the years including Mol Sys Bio editors Thomas Lemberger and Maria Polychronidou and other PIs: Julio Saez-Rodriguez, Andre Brown, Hyun Youk and Paulo Aguiar. Some of what I write below has been certainly influenced by discussions with them. This is not meant as an extensive review so apologies in advance for missing references.

Where did systems biology come from?

It is not contentious to say that systems biology came about in response to the ever narrower view of reductionist approaches in biology. Reductionism is still extremely important and I assume that, as a movement, it was an opposition to the idea that biology was animated by some magical force that could never be comprehended. Since the beginning of the course we have asked students to read the assay “Can a biologist fix a radio?” by Yuri Lazebnik (2002). The article captures well the limitations of reductionist research. The more we know about a system, apoptosis in Yuri's case, the more complex and non-intuitive some observations may seem. Yuri's description of how a biologist would try to understand how a radio works is comical and still very apt today:

We would “remove components one at a time or to use a variation of the method, in which a radio is shot at a close range with metal particles. In the latter case radios that malfunction (have a “phenotype”) are selected to identify the component whose damage causes the phenotype. Although removing some components will have only an attenuating effect, a lucky postdoc will accidentally find a wire whose deficiency will stop the music completely. The jubilant fellow will name the wire Serendipitously Recovered Component (Src) and then find that Src is required because it is the only link between a long extendable object and the rest of the radio.”

One of the driving forces for the advent of systems biology was this limitation, so brilliantly captured by Yuri, that reductionism can fail when we are overwhelmed with large systems of interconnected components.

Around the time that Yuri wrote this article our capacity to make measurements of biological objects was undergoing a revolution we generally call omics today. In 2001 the first drafts of the human genome were published (Lander et al. 2001; Venter et al. 2001). Between 2000 and 2002 we had the first descriptions of large scale protein-protein (Uetz et al. 2000; Ito et al. 2001; Gavin et al. 2002) and genetic interactions mapping (Tong et al. 2001). The capacity to systematically measure all aspects of biology appeared to be within our grasp. The interaction network representation of nodes connected by edges is now an icon in biology, even if not as recognisable as the double helix. This ever increasing capacity to systematically measure biology was, alongside the complexity of highly connected components, the second major driving force for the advent of systems biology.

What is systems biology?

So around 2000 biology was faced with this upcoming flood of data and highly complex nonlinear systems. Reductionism was failing because mental models were insufficient to cope with the information available. The reaction was a call for increased formalism, better ways to see how the sum of the parts really works. Perspectives were written (Kitano 2002) and institutes were born (Institute for SystemsBiology). Within the apparent complexity of biology there might be emergent principles that we were not seeing simply because we were looking too narrowly and could not combine information in a formal way. Whatever the system of interest (e.g. proteins, cells, organisms, ecosystems) there must ways to take information from one level of abstraction (e.g. proteins) and understand the relevant system's features of the abstraction layer above it (e.g. cell behaviours). This comes closest to a definition of systems biology put forward by Tony Hyman (Hyman 2011) but many others have defined it in vaguely similar ways, or maybe in similarly vague ways.

Power laws and the perils of searching for universal principles

When introducing systems biology I have been giving two examples of work that illustrate some of the benefits (network motifs) but also some of the perils (power law networks) of trying to find universal principles in biology. One of these examples was the research on the organisation of biological networks. As soon as different networks were starting to be assembled, such as protein-protein, genetic and metabolic networks, an observation was made that the distribution of interactions per gene/protein is not random (as studied by Paul Erdös). Most proteins have very few interactions while some rare proteins have a disproportional large amount of interactions – dubbed “hubs”. Barabasi and many others had a series of papers describing these non-random distributions, called power-law networks (Jeong et al. 2000), in all sorts of biological networks. Analogies were drawn to other non-biological networks with similar properties and it is not an understatement to say that there was some hype around this. The hope was that by thinking of the common processes that can give rise to such networks (e.g. preferential attachment) we would know, in some deep way, how biology is organised. I will just say that I don’t think this went very far. Modelling biological networks as nodes and edges allowed the application of graph theory approaches to biology, which has indeed been a very useful inheritance from this work. However, we didn't find deep meaning in the analogies drawn between the different biological and man-made networks, although I am sure some will disagree.

Network motifs, buzzers and blinkers

Around the same time, the group of Uri Alon published very influential work describing recurring network motifs in directed networks (Milo et al. 2002; Shen-Orr et al. 2002). For example, in the E. coli transcriptional network they found some regulatory relationships between 3 different genes/operons that occurred more often than expected by chance. One example, illustrated to the right, was named a coherent feedforward loop where an activating signal was sent from an “upstream” element X to a “downstream” element Z both directly and indirectly via an intermediate third element. The observation begs the question of the usefulness of such an arrangement (Mangan and Alon 2003; Kalir et al. 2005). This has been generalised to studying the relation between any set of such directed interactions with specific reaction parameters – defined as the topology - and their potential functions. In a great review Tyson, Chen and Novak summarise some of these ideas of how regulatory networks can act, among other things as “sniffers, buzzers, toggles and blinkers” (Tyson et al. 2003).

These and other similar works showed that, within the complexity of regulatory networks, design principles can be found that encapsulate the core relationships giving rise to a behaviour. Once these rules are known, an observed behaviour will constrain the possible space of topologies that can explain it. This has led researchers to search for missing regulatory interactions that are needed to satisfy such expected constraints. For example, Holt and colleagues searched for a positive feedback that would be expected to exist for the switch-like dissolution of the sister-chromatid cohesion at the start of anaphase (Holt et al. 2008). This mapping between regulatory networks and their function can be applied to any system of interest and at any scale. The same types of regulatory interactions are used for termites building spatially organised mounds and for growing neurons seeking to form connections (as illustrated in a review by Dehmelt and Bastiaens). Different communities of scientists can come together in systems biology meetings and talk in the same language of design principles. This elegance of finding “universal” rules that seemingly explain complex behaviours across different systems and disciplines has been a great gift of systems biology. It is of course important to point out that such ideas have a much longer history from homoeostasis in biology and control theory in engineering.

Bottom-up network models

Alongside the search for design principles in regulatory interactions the formal mathematical and computational modelling of biological systems gained prominence (e.g. Bhalla and Iyengar 1999). Mathematical models are much older than systems biology but they started to be used more extensively and visibly with the rise of systems biology. Formalising all of the past knowledge of a system was shown to be a useful way to test if what is known was sufficient to explain the behaviour of the system. Models were also perturbed in silico to find the most relevant parameters and generate novel hypothesis to be tested experimentally. This model refinement cycle has been used with success for example in the modelling of cell cycle (Novak and Tyson 1993, Tyson Noval 2001; among many others) or circadian clock (Locke et al. 2005; Locke et al. 2010; Pokhilko et al. 2012). However, this iteration between formal modelling and experiments has not really taken off across many other systems. The reason for the lack of excitement is not clear to me although I have the impression that often the models are not used extensively beyond asking if what we know about a system sufficiently explains all of observed outcomes and perturbations.

Top-down systems biology and everything in between

From the start there has been a division between the researchers that identified themselves as part of the systems biology community. Bottom-up researchers have been focused on the formal modelling of systems, the discovery of design principles and emerging behaviours. Top-down researchers would argue that a truly comprehensive view of a system is needed. These scientists have been more focused on further developing and applying methods to systematically measure biological systems. The emphasis in this camp has been on developing generalizable strategies that can take large-scale observations and identify rules, regardless of the system of interest. I would say that these works, my own included, have been less powerful in identifying elegant universal rules. By this I mean, for example, those initial attempts to find common principles across biological and man-made networks. Instead of principles, what have been readily transposed across systems have been approaches such as machine learning methods. Drug screens with behavioural phenotypes, genetic interaction networks or developmental defect screens with gene knock-downs can all be analysed in the same ways. Such systematic studies have driven costs down (per observation) and contrary to “representative” experiments in small scale studies, the large-scale measurements tend to be properly benchmarked for accuracy and coverage.

What is still missing are ways to bridge the divide between these two camps. Ways to start from large-scale measurements that result in models that can be studied for design features. Studies that include perturbation experiments come closer to achieve this. Examples for network reconstruction methods have shown that it should be possible to achieve this but we are not quite there yet (Hill et al. 2016).

From systems biology to systems everything

As scientific movement systems biology started in cell biology, as far as I can tell, but has since then permeated many other areas of research. As examples, I have heard of systems genetics, systems neuroscience, systems medicine, evolutionary systems biology and systems structural biology. In 2017 we still face a flood of data and highly complex nonlinear systems. However, the reductionist approaches now typically go hand-in-hand with attempts to formalise knowledge in quantitative ways to identify the key relationships that explain the function of interest. In a sense, the movement of systems biology has succeeded to such an extent that it seems less exciting to me as field in itself. It is a fantastic approach that is currently being used across most of biology but there is less developments that alter how we do science. I am curious as to what other researchers that identify themselves with doing systems biology think - What have been great achievements of systems biology? What are the great challenges that are not simply applications of systems biology? Questions to think about for the (equally arbitrary) celebration of the 20 years of the field in 2020.

Friday, February 10, 2017

Predicting E3 or protease targets with paired protein & gene expression data (negative result)

Cancer datasets as a resource to study cell biology

The amazing resources that have been developed in the context of cancer biology can serve as tools to study "normal" cell biology. The genetic perturbations that happen in cancer can be viewed almost as natural experiments that we can use to ask varied questions. Different cancer consortia have produced, for the same patient samples or the same cancer cell lines, data that ranges from genomic information, such as exome sequencing, to molecular, cellular and disease traits including gene expression, protein abundance, patient survival and drug responses. These datasets are not just useful to study cancer biology but more globally to study cell biology processes. If we were interested in asking what is the impact of knocking out a gene we could look into these data to have, at least, an approximate guess of what could happen if this gene is perturbed. We can do this because it is likely that almost any given gene will have changes in copy number or deleterious mutations given a sufficiently large sample of tumours or cell lines. Of course, there will be a whole range of technical issues to deal with since it would not be a "clean" experiment comparing the KO with a control.

Studying complex assembly using protein abundance data

More recently the CPTAC consortium and other groups have released proteomics measurements for some of the reference cancer samples. Given the work that we have been doing in studying post-translational control we started a few projects making use of these data. One idea that we tried and have recently made available online via a pre-print was to study gene dosage compensation. When there are copy number changes, how often are these propagated to changes in gene expression and then to protein level ? This was work done by Emanuel Gonçalves (@emanuelvgo), jointly with Julio Saez-Rodriguez lab. There were several interesting findings from this project, one of these was that we could identify members of protein complexes that indirectly control the degradation of other complex subunits. This was done by measuring, in each sample, how much of the protein abundance changes are not explained by its gene expression changes. This residual abundance change is most likely explained either by changes in the translation or degradation rate of the protein (or noise). We think that, for protein complex subunits, this residual mainly reflects degradation rates. Emanuel then searched for complex members that had copy number changes that predicted the "degradation" rate of other subunits of the same complex. We think this is a very robust way to identify such subunits that act as rate-limiting factors for complex assembly.

Predicting E3 or protease targets

If what I described above works to find some subunits that control the "degradation" of other subunits of a complex then why not use the exact same approach to find the targets of E3 ligases or proteases ? Emanuel gave this idea a try but in some (fairly quick) tests we could not see a strong predictive signal. We collected putative E3 targets from a few studies in the literature (Kim et al. Mol Cell Biol. 2015; Burande et al, Mol Cell Proteomics. 2009; Lee et al. J Biol Chem. 2011; Coyaud et al. Mol Cell Proteomics. 2015; Emanuele MJ et al. Cell 2011). We also we collected protease targets from the Merops database. We then tried to find a significant association between the copy number or gene expression changes of a given E3 with the proxy for degradation, as described above, of any other protein. Using the significance of the association as the predictor with would expect a stronger association between an E3 and their putative substrates than with other random genes. Using a ROC curve as descriptor of the predictive power, we didn't really see robust signals. The figure above shows the results when using gene expression changes in the E3 to associate with the residuals (i.e. abundance change not explained by gene expression change) of the putative targets. The best result, was obtained for CUL4A (AUC=0.59) in this case but overall the predictions are close to random.

A similar poor result was generally observed for protease targets from the merops database although we didn't really make a strong effort to properly map the merops interactions to all human proteins. Emanuel tried a couple of variations. For the E3s he tried restricting the potential target list to proteins that are known to be ubiquitylated in human cells but that did not improve the results. Also, surprisingly, the genes listed as putative targets of these E3s are not very enriched in genes that increase in ubiquitylation after proteasome inhibition (from Kim et al. Mol Cell. 2011) with the clearest signal observed in the E3 targets proposed by Emanuele MJ and colleagues (Emanuele MJ et al. Cell 2011).

Why doesn't it work ?

There are many reasons for the lack of capacity to predict E3/protease targets in this way. The residuals that we calculate across samples may reflect a mixture of effects and degradation may be only a small component. The regulation of degradation is complex and, as we have shown for the complex members, it may be dependent on other factors besides the availability of the E3s/proteases. It is possible that the E3s/proteases are highly regulated and/or redundant such that we would not expect to see a simple relationship between changing the expression of one E3/protease and the abundance level of the putative substrate. The list of E3/protease targets may contain false positives and of course, we may have not found the best way to find such associations in these data. In any case, we though it could be useful to provide this information in some format for others that may be trying similar things.

Friday, January 13, 2017

State of the lab 4 – the one before the four year review

It has been 4 years since I started as a group leader at the EMBL-EBI (see past yearly reports – 1, 2 and 3). This year the group composition has been mostly stable, with the exception of interns that have rotated through the group. We had Bruno Ariano (twitter) visiting us for 6 months working on a project to build an improved functional interaction network for Plasmodium. Matteo Martinis has joined the group for a few months and is working with David Ochoa on comparing in-vivo effects of kinase inhibitors with their known in-vitro kinase inhibition effects. Finally, Areeb Jawed has joined Cristina and Bede, for some months, in their efforts to develop genetic methods to study protein modification sites. I think we had a great year in terms of publishing and I had the luxury of not trying to apply for additional funding. That luxury is short lived as we have funding that is ending this year that I will try to replace.

The one before the four year review

All EMBL units are evaluated every 4 years by a panel of external reviewers. The next review for the research at EMBL-EBI is coming up now in March and we have been preparing the required documentation for this. Naturally this forced me to think about what we have achieved as a group for the past 4 years and what we aim to do for the next 4. It is impossible not to go through this process without being drawn into some introspection and without comparing our performance with that of those around me. I think we did well in this period of time, we got two significant grants funded (HFSP and ERC) and published some articles that I feel have been significant contributions towards the study of kinase signaling. I remember my interview for this position when they asked me what I would expect to achieve in the next 5 years. My first though was: “Really ? That question ?”, but I think we did achieve we I had hoped at the time. Still, at EMBL-EBI we are surrounded by some fantastic colleagues that keep the bar really high. It is hard to be satisfied and I am certainly motivated by our research environment to try to help our group to keep up the good work. This review will also determine if our group receives a 4 year extension after the first 5 years. I am confident but still apprehensive and curious about what the reviewers will say.

Studying cell signaling states using phosphoproteomics

During the past four years we have worked on several aspects of kinase based cell signaling. I mentioned before our work on trying to describe the evolutionary history of protein phosphorylation (blog post) and to predict the kinase specificity from interactions networks and phosphoproteomic data (blog post). I haven't described yet our work on studying cell signaling states that has been published a few months ago When David Ochoa started in the group around 3.5 years we reasoned that, by collecting information on how phosphosites abundances change across a large number of conditions, we would be able to use the profile of co-regulation across conditions to learn about how cell signaling systems work. This is copying by analogy what has been done in gene expression studies since Mike Eisen's work. David made use of published conditional phosphoproteomic studies to compile a very large compendium of different conditions. There are issues related to the incomplete coverage of mass spectrometry measurements and potential batch effects of the different studies. David tried to work around these, primarily by focusing the analysis on groups of phosphosites (e.g. targets of the same kinase) instead of individual positions. Using this data he derived an atlas of changes in activities for around 200 human kinases across nearly 400 different conditions. We show in this work how this can be used to advance our knowledge of kinase signaling (Ochoa et al. Mol Sys Bio 2016, and the phosfate webserver).

For me this work was the fist time we could measure a large number of cell signaling states. To see what is the structure of this state-space and what we can learn from this. What kinases are most often regulated ? What kinases define particular signaling states ? What states act as “opposing” states and how may we use this information to promote or inhibit specific states or transitions through the state-space ? These are all questions that we can address with this atlas. The fact that the data was collected from different publications, using different protocols and machines will certainly have an impact on the accuracy and resolution of this atlas. However, the quality and coverage and these types of experiments will only improve and I think this direction of research will continue to be exciting for long period of time.

Since this work we have also tried to benchmark different approaches to predict the changes of kinase activity from phosphoproteomic information (preprint). In collaboration with Julio Saez-Rodriguez's lab we also used some of the same concepts to relate the changes of metabolism with predicted changes in kinase, phosphatase and transcription factor activities (Gonçalves et al. PLOS Comp Bio 2016).

Onto the next four years

If I do get my contract extension, we will continue our current main research focus on studying cell signaling through the next four years. Although we will certainly continue to study long term evolutionary trends, such as the evolution of kinase specificity, we will complement this with trying to understand the impact of genetic variation for individuals of the same species with a strong focus on E. coli, S. cerevisae and H. sapiens (mutfunc). We have started to make use of cancer data as genetic resource to study human cell biology (preprint). We won't necessarily try to study cancer as a disease but I think that datasets for primary tumors and cancer cell lines are amazing resources to learn about how human cell biology and cell signaling work. The group will have its first big turnover of group members over the next 1 or 2 years which will be challenging professionally and personally. However this turnover will also allow for and shape future directions of the group which will also be exciting.

-->

Sunday, November 13, 2016

When only truthiness matters

As many others out there I am still trying to process the result of the US elections. I don’t usually write about politics but I think this does have relevance to science. The result brought me flashbacks of the outcome of the Brexit vote. In both occasions I woke up to a result that I found shocking and disheartening. Both times I went to work in a dazed state of denial trying to come to terms with the fact that so many people have viewpoints that are so different from mine. Personally, I find repugnant that both elections were so much about racism and fomenting protectionist and anti-immigration movements. There are many political and social issues around these elections that I am not going to touch on. The important point to science and scientists here is that these elections were won using many false statements and arguments. I know I am biased because these were not the outcomes I was hoping for. Still, I don’t think I am exaggerating when I say that the winning sides of both elections used a similar strategy of inventing a suitable reality that they pushed to their advantage. I am used to politicians bending the truth and making promises that they don’t keep but more and more they simply lie. As someone trained to be rational and critical to flaws in argument I live through it in complete disbelief. Trump did this all the time but one particular interview in the US elections really brought this point to home to me:

Newt Gingrich clearly states it here – it does not matter what the truth is, it matters what people feel the truth is. This is what Steven Colbert termed as truthiness, a joke that he should be thinking a lot about these days. The rise of truthiness is a danger to society. To get your way you no longer have to find arguments based on the present reality, you just have to be able to warp reality in your favor.

Filter bubbles and confirmation bias

The internet, with its immediate access to information and its global reach, should be a weapon in favor of reason. Instead it has actually increased our isolation as we sort ourselves by affinity to beliefs. I am this shocked with the results of these elections because I barely interact with those with the same set of beliefs of the winning groups. We live in filter bubbles (book, TED talk) in all the media that we consume and even in the places where we live in. This affinity based social sorting is amoral. The same ease of access that allows scientists to collaborate globally is bringing together any other likeminded group of people. In the book “Here Comes Everybody” Clay Shirky gives examples of groups of bulimics teaching each other techniques to avoid eating and how the internet may help terrorist groups. It is hard to break into these echo chambers because people tend to perceive as true whatever confirms their beliefs. This well-known phenomenon of confirmation bias gets magnified by communal reinforcement within the filter bubbles. Savvy social manipulators don’t have to change the opinions of those in these echo chambers, they can try to connect with and shepherd those within.

What do we do when truth and reason no longer matter? Scientific findings are no longer facts but just opinions and values. People can be pro or against vaccination for example. This is starting to have very serious and concrete consequences (e.g. global warming) and looks to be increasingly getting worse. Although in both elections the younger generations were less likely to have voted for the winning outcomes, I don’t think that echo chambers and the attack on reason are a generational problem. Maybe scientists should be having a more active role in promoting the importance rational thought or maybe it is a challenge that can only be solved by improving the education system.

Thursday, October 20, 2016

Group member profile - Marco Galardini

Marco Galardini (webpage, Gscholar, twitter, EMBL-EBI page), a postdoc in the group, is the next member that kindly volunteered to write a group profile page. He is currently one of the few people in the group that is not working directly with protein PTM regulation but is looking instead more generally at the consequences of mutations on cellular growth phenotypes.

What was the path the brought you to the group? Where are you from and what did you work on before arriving in the group?

I like to think of my career so far as a simulated annealing process, where the temperature parameter is substituted by curiosity. I started by studying applied chemistry in high school; we had to spend lots of time in the lab and we got plenty of opportunities to get our hands dirty with both inorganic and organic chemistry. The latter is probably the reason why I then pursued a bachelor degree in biotechnology at the university of Florence, with a focus on industrial and environmental processes; during that time I also got interested in microbiology, mostly by the great diversity and versatility of the bacterial kingdom. When I discovered that the University of Bologna was offering a masters degree in Bioinformatics I jumped into it with great enthusiasm, eventually combining it with the interest in microbiology during an internship at the Nijmegen university.

After a short break as a software developer in a company I started a PhD in Florence, carrying on a comparative genomics study in the nitrogen-fixing plant symbiont Sinorhizobium meliloti (PhD thesis). Since this project combined computational biology, microbiology and the impact on the environment, I can say that it succeeded in combining the various academic interests I had developed during the years. Following the simulated annealing analogy I can say that I sometimes felt like I was in a local optima. Under the supervision of Marco Bazzicalupo, Emanuele Biondi and Alessio Mengoni (lab page) I was lucky enough to ride the wave of genomics in a moment where getting bacterial genomes was becoming increasingly easy; I was therefore able to describe the interesting functional and evolutionary features of the (relatively) complex genome of S. meliloti, while developing some computational methods on the side.

What are you currently working on?

I'm currently two years into a very exciting project that aims to develop models
to predict phenotypes for the Escherichia coli species, in close collaboration with Nassos Typas (EMBL). Bacterial species are known to harbor striking genetic variability between strains, both in the form of point mutations, but also with respect to their gene content (the so-called pangenome), due to recombination and lateral gene trasfer. Understanding how this variability translates to differences in phenotypes has been therefore the focus of this project. This has proven to be both a challenging and valuable experience, as we had to build a strain collection from scratch, phenotype it on different growth conditions and sequence a large fraction of those strains.

For this I owe a great deal of gratitude to various members of the Typas group who have helped me out in running the wet-lab experiments, namely Lucia Herrera and Anja Telzerow. I am now in the process of testing the predictive models, who have proven to show very promising results, with potential applications to other species, in and outside the bacterial kingdom.

What are some of the areas of research that excite you right now?

Despite the common claim that no great discoveries are made anymore, I think that science is moving faster and getting bigger every day; if we want to be optimistic it should only be a matter of time before this will start to have an impact on our everyday lives. Some examples involving microbiology include real-time tracking of infectious diseases (e.g. WGSA, or NextFlu) and microbial communities as environmental sensors (e.g. Smith et al. mBio 2015). I'm therefore very excited to see how the lag time between a discovery and its application shrinks; there are legitimate concerns of course (e.g. laws not catching up, democratization of new technologies), but I can't help being thrilled about it. I also enjoy reading about how human activities are becoming a new powerful selective pressure in evolution; antibiotic resistance is the best known example, but there are also positive examples like the reports of bacterial species evolving the ability to degrade plastic. This shows that the natural world is still worth exploring and that evolution can also act on very short time-scale.

What sort of things do you like outside of the science?

I used to be quite active in photography, with a preference for analogic media
such as black and white films and polaroids; despite not being very active right now, I'm still packing my camera when going for a short trip. I also have an interest in small DIY projects involving music; I have built some experimental synths running on Arduino, which were used in a band I used to play in. Apart from that, I enjoy reading and watching movies, going to contemporary art exhibitions, and a bit of cycling.

Thursday, October 13, 2016

Phylogenetic history of fungal protein phosphorylation – the anti-press release

I have long been interested in studying the rate by which protein interactions change during evolution. A new chapter in this ongoing research agenda has been published this week (article & perspective) in collaboration with the group of Judit Villén in the University of Washington and many contributions from the labs of Maitreya J. Dunham, Eulàlia de Nadal and Francesc Posas. For the first time I tried to engage with the press by putting out a press-release and it was interesting to work with Mary Todd Bergman at EMBL-EBI to digest the work to its core message. However, to atone for my sins of not being able to give sufficient context and credit to the work that has come before this, I decided that I could use this blog to write a sort of anti-press release. Grab some coffee, get confy and don’t expect a punchy fast message here because this manuscript has a long and branched root.

Cue flashback …

For me, this started 15 years ago (gasp, I can't believe it has been this long) when Andreas Wagner published some work trying to measure the conservation of protein interactions after gene duplication. This in turn was made possible by the first protein interaction mapping efforts. In my PhD lab I was using conservation to predict interactions for SH3 domains that bind short linear proline rich peptides. Influenced by Andreas Wagner’s papers, “linear-motif” research at EMBL and the field of evolution of gene expression I hypothesized that domain-peptide interactions could be poorly conserved since they are mediated only by a few residues in a linear unstructured peptide. This idea was first reported in the literature in a perspective by Neduva and Russell also at the EMBL at the time. I tried to generalize the concept that specificity and evolvability could be related such that very unspecific interactions may be more prone to change during evolution (article, blog post). Other groups have also shown that linear motif interactions can be fast evolving (e.g. Chica et al, Edwards et al.,)

Mass spectrometry to the rescue

The problem with trying to compare protein interactions is that you need to measure them first. The domain-peptide interactions mediated by linear motifs are particularly hard to identify because they are usually of low affinity. So, the work described above was based predicted interface sites for linear motifs. At this point, improvements in mass spectrometry and enrichment strategies really made a difference. The identification of protein phosphorylation sites made it possible to find, in large scale, thousands of sites that represent high-confidence interaction sites. The back-story that resulted in these developments in MS is a story our collaborator Judit Villén has been a part of and that I can’t tell as well.

Kinase-target interactions are also linear motif interactions and if the previous linear motif research was correct the phosphosites that represent these interactions should be rapidly evolving. That was exactly what I ended up testing when I started my postdoc. We were just one of several groups working on it and in 2009 several papers got published on the topic including our work (Beltrao et al., blog post) and others (Landry et al., Tan et al., Holt et al, Amoutzias et al. ). All of these together made a really strong case for the fast divergence of protein phosphorylation, although other articles followed to also note the constraints (Nguyen Ba & Moses and Gray & Kumar). At this point the conversation was also shifting to the consequences of these evolutionary changes. Mirroring similar discussions around the consequences of changes in gene-expression there was a sense that some of these phosphosites, and therefore kinase-substrate interactions do not play a functional role (Gustav E. Lienhard 2008 , Landry et al. 2009.). I also tried to contribute to the debate on functional relevance by trying to assign functions to PTM sites computational and extending the conservation analysis to other PTMs (Beltrao et al. 2013, blog post).

What was left to find then?

Most of the studies mentioned so far have relied on pairwise species comparisons. What we tried to do in this more recent study was to obtain a phylogenetic history of protein phosphorylation across a very broad phylogeny. For this, Judit’s lab obtained phosphorylation data for 18 fungal species that shared a common ancestor hundreds of millions of years ago. Romain Studer in our group then tried to combine the phosphorylation observations, which are known to be incomplete, with sequence based predictions of phosphorylation potential and the species phylogenetic tree. This allowed us to predict a likely evolutionary history for thousands of phosphosites.

If you happen to have kept up with the literature that I mentioned above then you might expect some of the findings we observed next - most phosphosites are recent acquisitions and the small fraction of ancient phosphosites is enriched in functionally relevant sites. From the ancient sites we tested a few cases for fitness and functional consequences and we think these serve as great resource for future cell signaling studies (and yes we are chasing that). Given the breath of species we studied we could also measure the changes in phosphorylation “motifs” that are found across species. Kinases recognize their target sites, in part, by the sequences around the phospho-acceptor residue, so-called kinase target motifs. We could observe that the types of target motifs used across species showed changes that we think relates to changes in the types of kinases or their activities. We are now interested in better understanding what determines kinase specificity so that we can study their evolution - what did the first protein kinase look like ?

So who the hell cares?

Many of the methods we are working on are useful to better understand the impact of mutations related to these signaling circuits in cancer or other diseases. We are working on this too but I care about this because I want to know how nature comes up with all these beautiful diverse mechanisms and forms. Coming up with a history of how these phosphosites have been changing across species is really just the first step. We have almost no clue as to what the thousands of observed phosphosites are doing, if anything. Are the signaling pathways changing in a neutral way that conserves the functional outcomes?

From a personal note it is fantastic to be able to connect this work to things that I did all the way back to my first PhD paper and that I can connect this blog post to a chain of several other blog posts covering the research I have done and that our research group is doing now.

Friday, June 17, 2016

Group member profile - Romain Studer

Next up on this series of group member profiles is Romain Studer (blog, scholar profile, twitter), a postdoc in the group that is very interested in protein evolution combining sequence and structural information.

What was the path the brought you to the group? Where are you from and what did you work on before arriving in the group?

My main interest in biology is the study of proteins in a broad diversity of organisms. My PhD work, as well as my postdoctoral research, was focused on protein evolution, at the primary sequence level and at the tertiary structure level.

I did my undergraduate studies and PhD work at University of Lausanne, Switzerland. My undergraduate studies were focused on immunology and biochemistry, with a dash of bioinformatics. My PhD research, with Prof. Marc Robinson-Rechavi, was more on evolution and mainly focused on the comparison between paralogs (i.e. genes that diverged after a duplication event) and orthologs (i.e. genes that diverged after a speciation event). Positive selection can be used as a mechanism to fix advantageous mutations between paralogs, as well as between orthologous genes. The conclusion of my analyses was threefold: (1) positive selection affects diverse phylogenetic branches and diverse gene categories during vertebrate evolution; (2) positive selection concerns only a small proportion of sites (1%-5%); and (3) whole genome duplication had no detectable impact on the prevalence of this positive selection (Studer RA et al. 2008, Studer RA et al 2010).

After my PhD, I stayed a few months in Lausanne to work with Prof. Bernard C. Rossier to explore the evolution of sodium pumps and channels, involved in the regulation of blood pressure. I found that the sequential emergence of the different subunits of these proteins could be directly linked to the emergence of multicellularity in animals (Studer RA et al. 2011; Rossier BC et al. 2015).

In 2010, I then obtained two successive fellowship grants from the SNSF to move to UK. I worked in the group of Prof. Christine Orengo, where I have explored in more details the influence of structure on protein evolution. I contributed to the evolutionary aspect of the CATH database, a classification of protein domains. I also explored the evolution of RubisCO, the enzyme responsible for photosynthesis. I reconstructed the ancestral 3D structures of RubisCO and estimated the stability effect (ΔΔG) of mutations during evolution. The essential conclusion of this work was that mutations providing an increase in catalytic rate tend to be destabilising, but are rapidly followed by stabilising mutations during the course of evolution (Studer et al 2014).

My SNSF funding finished by the end of Summer 2013 and I then started to work as a senior postdoctoral fellow with Pedro Beltrao at the European Bioinformatics Institute (EMBL-EBI).

What are you currently working on?

My current project is to estimate the level of conservation of posttranslational modifications (PTMs) in proteins, in particular phosphorylation. Phosphorylation is an important mechanism to quickly regulate protein function. Combining phylogenomics methods and experimental phosphoproteomics data, I am evaluating the replacement rate of phosphorylated residues during the evolution of multiple yeast species. I found that (1) most phosphosites are quite recent, (2) ancient phosphosites are very likely to be important for function and (3) motif preference have diverged across species.

What are some of the areas of research that excite you right now?

One interesting field is the application of experimental analyses on ancestral characters, such ancestral amino acid mutations, phosphorylation state or whole ancestral proteins. Evolutionary frameworks allow the prediction of ancestral sequences with good accuracy. Such sequences can then be modelled in 3D structure by homology modelling, or can be even resurrected in vitro by protein synthesis. These ancient proteins can be submitted to the same analysis as their modern counterpart and explore the difference over the time. This framework has the potential to reveal important properties.

Monday, January 11, 2016

State of lab, year 3 - the first group outcomes

Lab poster made by Omar for the EMBL lab day

This is the third blog post of what I hope will be a very long series. Even in just three years it is fun to go back and read the past yearly entries (year 1 and year 2). I am sure I will enjoy reading back over 5 and 10 of these yearly reports. This report marks the end of the third year of the lab. I have to stop thinking of how quickly a year goes by. We will have a review in March 2017 that will likely dictate our extension after the first 5 years and if extended the group has then an additional 4 years before having to leave the EMBL-EBI (after a maximum of 9 years).

During the third year we said goodbye to Juan A Cordero Varela (master student, linkedIn). Marta Strumillo, that was doing an internship, stayed on to do her PhD in the group. Towards the very end of last year we were joined by two additional postdoctoral fellows, Bede Busby and Cristina Vieitez. As I had mentioned last year they will be working at the Genome Biology unit in Heidelberg in a close partnership with Nassos Typas' lab. Bede and Cristina are setting up yeast genetics methods to study protein modifications. This year I also started a blog series on our group members and I will try to get everyone to participate.

Group size and grant applications

At least for one year I have let myself apply for fewer funding opportunities. The group has now 12 members with one additional person joining this March. I am not sure what is the best strategy to manage the size of a group. Most grants and fellowships have very low success rate (10% to 30%) and if the objective is to maintain a specific group size then one would have to be very lucky to get just enough funding to stay at steady-state. I suspect that many group leaders just keep applying to all available funding and let the group size increase and collapse according to the success of the applications. I would be curious to hear from others what their thoughts are on this. My current impression is that somewhere between 5-15 people is a manageable and efficient group size but does anyone limit growth to stabilize group size ?

To be, or not to be, an experimental group (revisited)

Cristina and Bede at the visitor
lab space in EMBL-EBI

As described in the first year report, we don't have lab space at the EMBL-EBI. To be able to have access to lab space my initial solution was to co-supervise group members with experimental groups. This has been useful, particularly in creating closer collaborations with some of the groups involved. Haruna and Brandon have worked with Jyoti Choudhary to have access to mass-spec instruments. Sheriff has been working in London in the lab of Silvia Santos where he has contributed to some microscopy experiments and Marco spent some time in Nassos Typas' lab learning how to do chemical genetic screens. In all of these projects the group members are spending >50% of the time analysing the data. Bede and Cristina will be the first group members that will be primarily dedicated to experimental work, although I am sure they will also have an opportunity to further develop their computational skills. So far, these arrangements have been working out scientifically. However, I am now sure that, when I move out of the EMBL-EBI, I will aim to have access to lab space.

Projects as science, stories and publishable units

As I had mentioned in the second year report, I am no longer working on a research project myself. I had two periods of time last year where I emptied my to-do list but it didn't stay down long enough to be able to pick up a project. I am more at ease with the management role in the sense that I have convinced myself that it is actual work. It took me a while not to feel guilty about just doing management tasks. It is actually great to be able to help guide the flow of the projects of all of the lab members. From the inception, through the initial stumbles, turns in direction, building up the promising results, up until there is enough progress to be worth communicating it. This also means deciding to quit an idea when the research direction is no longer promising. In this process of managing a large set of projects I have felt a very clear temptation to focus on the publishable units as the outcomes. Although science is nothing if not communicated there is a risk of losing track of the priority of moving science forward. Asking questions and gathering evidence happens always in a scientific context. This context or story is also important for properly communicating your results to others. The problem is when the focus shifts too much into thinking about what are the experiments that are needed to write a paper instead of what are the best experiments to answer the scientific question at hand. These two things are hopefully aligned but the publishable unit should not be the goal in itself.

The first group outcomes

In the past year I finally managed to publish the last papers still involving my postdoctoral lab. The two articles reflect the two strands of research in our group. One paper describes a set of phosphorylation sites collected for X. laevis and an analysis of its conservation and structural features. We found that the degree of conservation of phosphosites and putative kinase-protein interactions is predictive of functionally relevant sites and interactions. We also describe a potential way to identify PTM sites that may control protein conformations. The second article is a large effort to identify conditional genetic interactions in S. cerevisiae. The main message of that work was that there is a substantial amount of genetic interactions that are condition specific. These conditional genetic data allowed us to identify novel roles for yeast genes in the cell wall integrity pathway. Besides these studies we also published the first articles from work that was started within the group. I mentioned before Omar's method to predict kinase specificity from interaction networks. In addition to this we also published a news and views article highlighting recent work from Stelzl's lab and a review on the feasibility of using rational design strategies to create novel PTM regulatory sites in proteins of interest. I was anxious with the time it was taking to get the group to this point. Three years to have research outputs coming from the group feels slow but when talking with others it is apparently not unusual.

Preprints and open science

We have two additional manuscripts that are now making their way through journals. David's project on a map of human signalling states based on conditional phosphoproteomics data and Romain's phylogenetic based analysis of fungal phosphorylation sites. I am personally very much in favour of preprint servers. Although I think I have been ahead of others in suggesting the use of preprints in biology (blog post 2006) I have been slow to actually do it. My current policy in the lab lab is to first ask the authors in the group if they want to submit and then make sure all collaborators are ok with it. Unfortunately, so far, there was no consensus among the authors. I will start to push more strongly for future manuscripts to be submitted to preprint servers. When possible, we will also experiment with making a projects's data and initial analysis available online before the preprints.

Tuesday, December 15, 2015

Group member profile - Haruna Imamura

Here is the second entry into what I hope will be a very long series where I introduce our lab's members. Next up is Haruna Imamura (pubmed), an interdisciplinary postdoc with experience in mass-spectrometry and informatics.

What was the path the brought you to the group? Where are you from and what did you work on before arriving in the group?

I first joined the biological network analysis group in my undergraduate course in the lab of Masaru Tomita at Keio University (Japan). I launched a project, which applied the concept of network analysis to a dataset of phosphorylation dynamics. Because of this experience, I grew increasingly interested in resolving the biological importance of phosphorylation in the context of signal transduction and began to study phosphoproteomics. From my master’s course, I joined the proteome group led by Yasushi Ishihama, in the same university, and learned proteomics-related experimental skills, including phosphorylation enrichment and mass spectrometry (MS) manipulation. As Prof. Ishihama moved to Kyoto University (Japan), I also moved and started my PhD course there. My PhD project was to determine the protein kinase selectivity towards their substrates (Imamura et al. 2014, Imami et al. 2012, Imamura et al. 2012) . We analysed lysates after in vitro kinase reactions and identified phosphorylation sites with MS to obtain kinase/substrate relationships in a high-throughput manner. The information obtained in the study would allow connecting already accumulated phosphorylation data to kinases.

As MS has been improved dramatically, nowadays there are more research studies coming up with a long list of identified phosphopeptides. However, it is revealing that only a small fraction of modification sites seem to have an important function in biological systems. So the next challenge in this field is mining functionally important phosphorylation among the pool of ‘junk’ phosphorylation. In this context, I mainly had three wishes for my post-doc project: (1) to be able to contribute through proteome experience, (2) to learn more about informatics, and (3) to reveal important phosphorylation in biological systems. I found Pedro’s group to be a great environment for it, and I asked him for position availability. Fortunately, there was a project that matched my background, and here I am.

What are you currently working on?
I am working on a project to study how phosphorylation in host cells is changed by the infection of Salmonella. Salmonella is a facultative intracellular pathogen that is one cause of diarrhoea in humans. The process of infection is like a series of offensive and defensive battles between Salmonella and the host cells. Salmonella tries to hijack and utilise the host’s cellular system for its proliferation, while the host cells tried to eliminate them by activating an immune response. Among various changes happening in the cells, post-translational modifications, including phosphorylation, play important roles.

We use Salmonella enterica serovar typhimurium as a model system and study host cell-lines that have been infected in a time-course. Their phosphoproteome are analysed using MS, and the experimental dataset was combined with other publicly available information by informatics to find out key regulations for Salmonella infection. I am an EIPOD fellow, which is a programme from the EMBL Interdisciplinary Postdocs (EIPOD) initiative, promoting interdisciplinary research at EMBL. This work is a collaboration with the Typas lab in EMBL-Heidelberg, which is an expert of microbiology and genetic interactions. The MS analysis has been done with the help of the Proteome core facility led by Jyoti Choundhary in the Sanger institute.

What are some of the areas of research that excite you right now?
With the current technology, phosphoproteome analysis with MS still requires a group of cells. It means the outcome would be averaged among a variety of cell populations. So I am interested in some projects attempting to do single-cell whole-proteome (or even phosphoproteome). Also, cellular imaging interests me, as it would be a complementary technology to MS. For example, mass cytometry could capture and quantify phosphorylation at the single-cell resolution in a systematic way, which enables the study of phosphorylation signalling on intercellular communications.

Besides, out of curiousness, I am interested in research which raises doubt regarding ‘self-consciousness’. For example, in molecular-scale, ‘behaviour epigenetics’ is one of the attractive topics for me, which describes how nurture shapes nature. Also, ‘gut-brain axis’ is gaining more attention, as it is shown that the gut microbiota communicate with the central nervous system and influence the brain. How true is ‘you are what you eat’? Finally, in macro-scale, one of my favourite videos from TED (Suicidal crickets, zombie roaches and other parasite tales) talks about some surprising incidences where parasites control the host brain and can change its behaviour.

What sort of things do you like outside science?
I have fun horse riding since a year ago. I have always wanted to do it since I was in Japan, and the environment here inspired me to start. The stable is about 15 mins by bike from the institute, so I can go there to have class once a week after work. It is good exercise and riding horses is relaxing. Also, it is fun to talk with people there who love horses and learn hands-on biology. I am trying to build a better relationship with the horses, who have a variety of personalities. In daily life, I usually go to the gym for running. It is becoming a routine in my life after I began when I was a PhD student. It helps me to clear my mind and gives my brain a chance to refresh. Running a marathon is one of the things on my bucket list, but I have to put more effort in to achieve it.

Monday, December 14, 2015

Replace journals with recommendation engines

There was another round of interesting discussions on twitter after Mike Eisen decided to scrub all journal tittles from his lab's publication list. Part of the discussion was summarized in this Nature news story. The general idea is that our science should not be evaluated by where it is published but should stand by its own merit. We all want this to happen but unfortunately we don't have infinite time to read papers. Megajournal and open access advocates often dismiss this problem. They will often say that journal rankings are not adequate filters and that we should be able to make our own opinions and to search for whatever we want to read. This is the line of argument that just drives me crazy. It is basically implying that any defence of journal rankings is an admission of inability to evaluate science. The biomedical scientific community is producing over 100 thousand articles every month. Any suggestion that we don't need some sort of filtering mechanism is in turn an admission that you are not aware of the extent of science that is being produced. If you are not scanning table of contents yourself, you are being feed suggestions by someone that does.

Imagine a world without any science journals. Just a single pot where all articles are deposited. I think that the spread of knowledge would slow down. I can barely keep track of advances and authors that are closely related to my work by using keyword searches. I would not think one day to just search for "clustered regularly interspaced short palindromic repeats" or to have a curious look into advances in cryoEM. In the absence of good filters we would risk becoming even more isolated in our small little corners of science and miss out on cross-fertilization. We would tend to focus even more on the science of a few labs that we knew from past works or from personal contact. I would not know where to look at for important new discoveries in other fields that could impact my own. The current system of journals serve this role of trying to assign a piece of science to a target audience. If nothing else, journals can filter through self-selection of topics at submission for specific communities. Less specific journals try to promote the advances in science that should reach a broader audience. I think that we are not even aware of how much the current system of journals facilitates the exchange of information within and across fields. In my opinion, the best way and probably the only way to get rid of the current system is to replace it by something that can do the equivalent job.

One way to replace the current system, by something less frustrating, would be to use automated recommendation engines. I have tried Google Scholar recommendations and Pubchase and both work really well. If we want to get rid of journals we need to figure out a way for such automated systems to mimic the journal's transfer of knowledge within and across communities. I can easily imagine the steps needed to come up with article similarity metrics and clustering of users and so on. One can also easily imagine that the recommendation engines can react to user feedback such that a niche community will "bump up" - for example by click-trough counts - the perceived value of a piece of science to such an extent that it get's recommended to a wider community. This would require a hierarchical recommendation engine that is widely used. The biggest advantage of such a system would be that it can work post publication on top of megajournals. Scientists could stop focusing their energy on submitting to journal X and just focus on producing good science that would spread widely. I am convinced that the fastest way to get to a world without journals is to come up with this replacement. If we really want to get rid of impact factors and journal rankings we need to start talking about what we will do instead.

One thing we won't be able to change - we don't have enough time to read all of the science in the world. Unfortunately we don't even have enough time to read all of the articles of job applicants. It is not hard to predict that any other solution that replaces journal rankings will too often used to make hiring decisions.

Friday, November 27, 2015

Predicting PTM specificities from MS data and interaction networks

Around four years ago I wrote this blog post where I suggested that it might be possible to combine protein interaction data with phosphosites from mass-spectrometry (MS) data to infer the specificity of protein kinases. I did a very simple pilot test and invited others to contribute to the idea. Nobody really picked up on it until Omar Wagih, a PhD student in the group, decided to test the limits of the approach. To his credit I didn't even ask him to do it, his main project was supposed to be on individual genomics. I am glad that he deviated long enough to get some interesting results that have now been published.

As I described four years ago, the main inspiration for this project was the work of Neduva and colleagues. They showed that motif enrichment applied to the interaction partners of peptide binding domains can reveal the binding specificity of the domain. One step of their method was to filter out regions of proteins that were unlikely to be target sequences before doing motif identification. For PTM enzymes or binding domains we should be able to take advantage of the MS derived PTM data to select the peptides for motif identification by just taking the peptide sequences around the PTM sites. This was exactly what Omar set out to do by focusing on human kinases as a test case.

To summarize the outcome of this project the method works with some limitations. For around a third of human kinases that could be benchmarked he got very good predictions (AUC>0.7). For some kinase families the predictions are better than others and we think it due to how specific the kinase is for the residues around the target site. It is known that kinases find their targets via multiple mechanisms (e.g. docking sites, shared interactions, co-localization, etc). This specificity prediction approach will work better for kinases that find their targets mostly by recognizing amino-acids near the phosphosite. With the help of Naoyuki Sugiyama in Yasushi Ishihama's lab we validated the specificity predictions for 4 understudied human kinases. One advantage of using this approach is that it could be very general. Omar tried it also on 14-3-3 domains, that bind phosphosites and also on a bromodomain containing protein that is known to bind acetylated peptides. Finally, we also tried to use this to compare kinase specificity between human and mouse but given the current limitation of the method I don't it is possible to use these predictions alone to find divergent cases of specificity.

The predictions for human kinase specificity can be found here and a tutorial on how to repeat these predictions is here. The motif enrichment was done using the motif-x algorithm. Given that we could not really use the web version Omar implemented the algorithm in R and a package is available here.

There are many other ways to predict specificities for PTM enzymes and binding domains. If you have many known target sites the best way is to train a predictor such as Netphorest or GPS. There is also the possibility of using the known target sites in conjunction with structural data to infer rules about specificity and the specificity determining residues. A great example of this is Predikin and more recently KINspect. Ongoing work in the group now aims to combine what Omar did with some aspects of Predikin to study the evolution of kinase specificity.

Going back to beginning of the post this idea was my second attempt at an open science project. The first attempt was a project on the evolution and function of protein phosphorylation (described here). This ended up being one of the main projects of my postdoc and now the main focus of the group. I am still curious to know if distributed open science projects will ever take off. I don't mean a big project consortia but smaller scale research where several people could easily contribute with their expertise almost as "spare cycles". Often when you are an expert in some analysis or method you could easily add a contribution with little effort. However, there was much more excitement about open science a few years ago whereas now most of the discussions have shifted to pre-prints and doing away with the traditional publishing system. Maybe we just don't have time to pay attention or to contribute to such open projects.

Thursday, August 13, 2015

EBPOD postdoctoral fellowship to study mutational properties in human cancers

Applications are open for the EMBL-EBI / Cambridge Computational Biomedical Postdoctoral Fellowships (EBPOD programme). This program, now in its second year, aims to foster collaborations between the EMBL-EBI, the NIHR Cambridge Biomedical Research Centre (BRC) and the University of Cambridge’s School of the Biological Sciences (SBS). Every year, groups of these institutions devise potential collaborative research areas and a set of project ideas is put forward. This year there are 8 projects to which applicants can apply to. The deadline is on the 3rd of September and applications should be sent via the Cambridge jobs website (http://www.jobs.cam.ac.uk/job/7770/).

This year our group is teaming up with Martin Miller's and Pippa Corrie's groups to study the mutational properties that associate with anti-tumour immune response and immunotherapy in human cancers. The full project description is available here. We are looking for applicants with a background in bioinformatics and an interest in genomics, DNA and protein evolution, sequence analysis and cancer biology. Extensive past experience (PhD) in bioinformatics is required.

We welcome any queries regarding the project including potential other directions that relate to theme of described project.

Wednesday, April 08, 2015

Positions available to study the functional relevance of protein phosphorylation

Photo by leg0fenris. Disclaimer: this photo should
not be taken as implicit support for the actions of the empire

In the past few years, thanks to advances in mass-spectrometry, tens to hundreds of thousands of phosphorylation sites have been discovered across different species. However, even for very well studied model organisms like yeast we known the function of only a very small number of these. Along with other groups, we have shown that these modifications can diverge quickly (Landry, Beltrao, Tan) leading to the hypothesis that some of these phosphorylation sites might even serve no purpose in extant species. Given these evolutionary observations and the large number of sites that are now routinely identified per study how do we go about identifying which ones are indeed functionally relevant ? In what environmental contexts ? How many might be "non-functional" ? If these questions sound interesting then we have two posts (postdoc and technician) currently open to develop genetic approaches that we think are going to be important to answer these questions. The work will be conducted at the EMBL Genome Biology unit in Heidelberg (Germany) in collaboration with the Typas lab.

Answering these questions will take a combination of different approaches ranging from proteomics to genetics and bioinformatics. These positions, although focused on the genetics aspects, will offer the possibility to explore and learn from the other expertise. The deadline for application is the 17th of May. Additionally information about our group can also be seen at the EBI webpage and we welcome informal questions about the project and positions by email.