Wednesday, January 10, 2018

State of the lab 5 – in the flow with 4 years to go

This blog post is part of a yearly series and marks the end of the 5th year as a group leader at EBI. In March we had an external evaluation of all research groups at EMBL-EBI. It was an interesting experience and overall it was judged a great success for EBI. For our group it was also part of the evaluation towards the standard renewal of contract where I got the 4 year extension. Since there is essentially no tenure at EMBL this also means that I have 4 years until I have to find a senior PI position. This is still a long time but it will increasingly be on my mind going forward. I am not particularly worried but I feel like there are many more places now in Europe with fixed term junior group leader positions. The postdoc bubble will turn into the junior PI bubble and we will have another big barrier and competition in the transition between junior and senior positions.

Personally it is almost strange to stay in the same place after 5 years since I have been typically staying 4-5 years in each place during university (Coimbra), PhD (Heidelberg) and postdoc (San Francisco). It looks like I will have to find some other excuse to thin out my pile of papers on the desk instead of simply moving to a new country and trashing everything.

The end of a cycle
Last year was our most productive year so far, as measured by the number of publications. This year is going to top it based on the manuscripts that I should be working on at the moment instead of writing this post (sorry guys). The research in the group is just flowing with more synergies among the group members. Just when everything is working so well is when so many in the group are leaving. Last year our first PhD student finished (Omar, now at DeepGenomics) and two postdocs have left (Romain moved to benevolentAI and Sheriff is now a project leader at EBI). This year there will be even more people potentially leaving. It is going to be a new challenge to try to keep the science going through the turnover. On the other hand, new arrivals signal the start of new projects and are an opportunity to move the group in new directions. Just at the end of the year, we had 3 new members starting: Allistair (PhD student), Inigo (postdoc) and Abel (visiting PhD student). Abel and Inigo will be working on the impact of mutations in protein interactions and control of protein abundance while Allistair will likely work on the evolution of regulatory networks.

Highlight from 2017 – Predicting condition specific phenotypes from genomes
Most of the work in the group is focused on understanding the function and impact of genetic variants on protein post-translational regulation, in particular for phosphorylation and ubiquitin. However, we have been also working more generically on the genotype to phenotype problem. I think these analyses could use more prior knowledge information and we are trying to contribute in this direction.

Part of this work, led by Marco (GScholar, Twitter) and in collaboration with the Typas lab in Heidelberg was finally published at the end of this year. The question we wanted to address was to what extent we can predict condition specific phenotypes of a strain of E. coli based on its genome and what we know from the well-studied E. coli K-12 lab strain. This is inspired by work that Rob Jelier and Ben Lehner did in S. cerevisiae but on larger scale. To set the project up, imagine we know that a given gene X of E. coli is required for growth under high heat. Then, if that gene X is not present or severely mutated in a strain of E. coli, we would expect that this mutated strain should not survive well in high heat. To test this in large scale we assembled a panel of hundreds of strains of E. coli for which we obtained genomes and fitness measurements under many conditions. We modelled the consequence of mutations using different methods and we collected prior knowledge of which genes are supposed to be important for each condition. In the end we could only predict which strains would tend to grow poorly for around 40% of conditions. This level of success may not be surprising since we didn't take into account for example issues like gene expression levels or compensation by new genes. It could be that gene function may be a lot more plastic than currently assumed but to prove this we will need different experiments.

Besides testing the central question expressed above this collection of E. coli strains with associated data will hopefully serve as resource for future studies. Any additional layer of molecular data (e.g. gene expression) or phenotype (e.g. motility) we measure can make use of all of pre-exiting information. We could ask if motility correlates with the growth under several drugs we tested for example. All of the resources for this collection are freely available and of course this would not be possible without the hard work of the scientist that collected the strains to begin with (listed here).

Highlights for the year ahead
We have 3 different projects that are close to completion that relate to the functional relevance of protein phosphorylation. This is probably going to be our biggest contribution of 2018. We continue to work with the cancer related datasets, primarily using these data to study protein post-translational regulation. Not necessarily to better understand cancer but making use of the large genetic and molecular variation that exists in cancer to better understand the regulatory processes of normal cells. Additionally we will have some progress to report on the evolution of protein kinases and potentially the evolution and regulation of ubiquitylation.

Friday, January 05, 2018

Group member profile - Omar Wagih

The latest instalment of this blog post series is by Omar Wagih (@omarwagihGscholar) who has just last month successfully defended his PhD. Along with Marco, Omar has been part of the group working on studying how DNA variants relate to phenotypes. He developed the mutfunc resource and the fantastic guess the correlation game.

What was the path the brought you to the group? Where are you from and what did you work on before arriving in the group?
My love of genetics is, in more ways that one biologically ingrained. Growing up in a family of scientists, I was always surrounded by a wealth of information which I instinctually sought to organise. For this, I pursued my undergraduate and masters degree at the University of Toronto, majoring in computational biology and computer science, respectively. Along the way, I was fortunate to work in some of the leading computational biology labs in Canada including those of Gary Bader, Philip Kim, Charlie Boone, Brenda Andrews, Andrew Fraser and Andrew Emili. I worked on a range of projects which ranged from analysing images of genetic screens of yeast to determining the impact of disease mutations on kinase-substrate phosphorylation. These experiences led me to develop an interest in understanding how changes in the genome translate to variability in cellular physiology, and ultimately phenotype, which prompted me to pursue my PhD.

What are you currently working on?
My current project involves working towards a deeper understanding of how changes in the genome propagate to phenotypic variability by predicting which cellular mechanisms are likely to be impacted. For the past several years I have been developing and using computational methodologies to assess the mechanistic impact of natural and disease-causing mutations. I have been applying these to yeast, human and bacteria models in hopes of streamlining hypothesis-driven variant annotation. I have also been utilising these predictions to assess the overall burden these mutations impose on gene function and putting such information towards conducting gene-phenotype associations.

What are some of the areas of research that excite you right now?
I'm intrigued by novel mutagenesis technologies that are allowing us experimentally assess the impact of genetic variants on cellular fitness and function in a massively parallel fashion. Technologies like deep mutational scanning CRISPR are becoming increasingly common in achieving this and their off-target effects are steadily being reduced.

With such massive amounts of mutagenesis data, I'm also interested in how machine learning methodologies such as deep learning can be applied to learn how mutations collectively impinge on cellular function and ultimately phenotype. This would significantly improve the precision of variant impact predictors and, in my opinion, will have crucial roles in shaping the development of novel and personalised drug therapies.

What sort of things do you like outside of the science?
Whether I'm skiing, hiking, camping or exploring the city, or you'll more likely than less find me outdoors. I often partake in sports. During my time in Cambridge, I rowed for my college and was part of the university boxing team.

I have been fascinated by drones for a while and own a DJI Phantom 3, which I often use for aerial filming. I also enjoy landscape and portrait photography, particularly with my 50mm lens. If I still have extra time on my hands, you'll find me implementing silly ideas that come to mind into apps or games. Here are a few I've made: genewords, pubtex, and guess the correlation.

Monday, June 26, 2017

Building rockets in academia - big goals from individual projects

SpaceX just launched and landed another two rockets over the weekend. I don’t get tired of watching those images of re-entry and landing. The precision is mesmerizing and extremely inspiring. Leading a research group in academia I often look at research intensive companies and wonder about the differences and similarities between how research is done in both. I have never worked in such a company environment so these thoughts are certainly from the perspective of academia. 

The big goals and peripheral bets  

From reading about big tech companies and start-ups I can relate to how they appear to organize their product portfolio into a small number of main goals – their core product(s) – while at the same time experimenting with peripheral goals/products. Tesla started as a car company but may end up being a large battery company with small side of car manufacturing. As another example, most major tech companies are today experimenting with virtual reality. In these experiments, those involved face similar questions about uncertain outcomes and timeliness of their steps as we do in academia. One of the thrills in academia is that leap into the unknown where it is crucial to ask the right question just at the right time. The speed of progress in research can be very uneven with times spent floundering in the dark and times where you just happen to walk in the right direction and find big riches. Sometimes those explorations will lead you to unintended directions, away from your core research, where it might be worth moving additional resources. Aiming in the right direction at the right time is a rare skill that a researcher must have but that we don’t spend enough time training for. Also, the balance between focusing on the core and exploring other areas of interest is difficult to set. In academia it seems easier to obtain funding to keep working on your core than to move to new areas. I wonder how companies deal with these issues. I am extremely thankful to be working in a research institute where I get core funding that, although I have to justify, I get to use to explore ideas outside the core of what we do. Such flexibility could be a bigger part of how research funding gets distributed.

Individualized contributions to group goals

While setting a big goal and exploring peripheral objectives might have a lot in common between academia and companies, there is one aspect of how we work that appears very different. In setting the big overarching questions we have to accommodate the fact that each individual group member will have to stand out. PhD students are working on their theses and postdocs are building the work on which they will stand as future group leaders. Each project has to brilliantly stand on its own while simultaneously fitting together with other group projects, contributing to an even greater goal. As each research project can be an unpredictable grasp in the dark, as a group leader I feel like I have to be build an alluring house of cards. Projecting how several research projects might move forward and create an illusionary image of how they fit together to solve THE big question. Not only will we build the rocket that will save mankind but every single contribution from each team member has to solve an important problem. It is obvious that the overarching goal will have to shift with time as some projects move to their potential unintended outcomes. In the context of being flexible to follow peripheral bets, maintaining the big picture goal may be challenging. I would not be the first to propose more career tracks in academia where professional researchers don’t have to move into management roles to keep working in academic science. It would be interesting to try it out on some research institutions to see the effect it would have on how research agendas would be organized. 


Friday, April 28, 2017

Postdoc positions on context dependent cell signalling (wet and/or dry)

Why do some mutations cause cancer in some tissues and not others ? What happens to the cell signalling pathways during differentiation ? Why are some genes essential in some cell types and not others or why are some drugs more effective at killing some cell types than others ?

We think that this is a great time to be asking these questions of how the genetic background or tissue of origin changes cell states. More precisely for us, how this re-wires cell signalling. It has become routine to measure changes in phosphorylation across different conditions, including different cancer types. The Sanger and others are establishing panels of human cell lines that are being profiled with an increasing array of omics technologies with drug sensitivity and CRISPR based gene essentiality information. These panels offer a great opportunity to address these questions.

We want to combine the work we have been doing in studying human signalling with phosphoproteomic data, with variant effect predictors, microscopy based studies of cell signalling and network modelling to address this question of context dependent changes in cell signalling.

To support this research we have 2 postdoc positions available: one would be primarily computational and would involve image analysis and network modelling in collaboration with microscopy groups (see here for project and application); the second would be primarily experimental with a focus on microscopy. The latter would be available via the ESPOD fellowship scheme in collaboration with Leopold Parts group at Sanger (see here for project description and here to apply). The split between computational and experimental is open and wet/dry mixed candidates are encouraged as well to apply to both.

These projects complement existing work in the group using cancer Omics data to study the genetic determinants of changes in protein abundance and phosphorylation and will be in collaboration with work developed by the Petsalaki group at EBI that is also recruiting. Email me if you have any questions/concerns about the positions.

Monday, April 10, 2017

17 years of systems biology

I know that 17 years is not a very round number. It is also fairly arbitrary as I am assuming systems biology started around 2000 (see below). I was last week in Portugal, where every year for the past 8 years I have been teaching a week long course on Systems and Synthetic Biology to the GABBA PhD program. This might be the last year I take part in this course and so I felt it would be a good time to try to put some thoughts in a blog post. This course has been jointly co-organised from the beginning with Silvia Santos and we had several guests throughout the years including Mol Sys Bio editors Thomas Lemberger and Maria Polychronidou and other PIs:  Julio Saez-Rodriguez, Andre Brown, Hyun Youk and Paulo Aguiar. Some of what I write below has been certainly influenced by discussions with them. This is not meant as an extensive review so apologies in advance for missing references.

Where did systems biology come from?
It is not contentious to say that systems biology came about in response to the ever narrower view of reductionist approaches in biology. Reductionism is still extremely important and I assume that, as a movement, it was an opposition to the idea that biology was animated by some magical force that could never be comprehended. Since the beginning of the course we have asked students to read the assay “Can a biologist fix a radio?” by Yuri Lazebnik (2002). The article captures well the limitations of reductionist research. The more we know about a system, apoptosis in Yuri's case, the more complex and non-intuitive some observations may seem. Yuri's description of how a biologist would try to understand how a radio works is comical and still very apt today:

We would “remove components one at a time or to use a variation of the method, in which a radio is shot at a close range with metal particles. In the latter case radios that malfunction (have a “phenotype”) are selected to identify the component whose damage causes the phenotype. Although removing some components will have only an attenuating effect, a lucky postdoc will accidentally find a wire whose deficiency will stop the music completely. The jubilant fellow will name the wire Serendipitously Recovered Component (Src) and then find that Src is required because it is the only link between a long extendable object and the rest of the radio.”

One of the driving forces for the advent of systems biology was this limitation, so brilliantly captured by Yuri, that reductionism can fail when we are overwhelmed with large systems of interconnected components.

Around the time that Yuri wrote this article our capacity to make measurements of biological objects was undergoing a revolution we generally call omics today. In 2001 the first drafts of the human genome were published (Lander et al. 2001Venter et al. 2001). Between 2000 and 2002 we had the first descriptions of large scale protein-protein (Uetz et al. 2000Ito et al. 2001; Gavin et al. 2002) and genetic interactions mapping (Tong et al. 2001). The capacity to systematically measure all aspects of biology appeared to be within our grasp. The interaction network representation of nodes connected by edges is now an icon in biology, even if not as recognisable as the double helix. This ever increasing capacity to systematically measure biology was, alongside the complexity of highly connected components, the second major driving force for the advent of systems biology.

What is systems biology?
So around 2000 biology was faced with this upcoming flood of data and highly complex nonlinear systems. Reductionism was failing because mental models were insufficient to cope with the information available. The reaction was a call for increased formalism, better ways to see how the sum of the parts really works. Perspectives were written (Kitano 2002) and institutes were born (Institute for SystemsBiology). Within the apparent complexity of biology there might be emergent principles that we were not seeing simply because we were looking too narrowly and could not combine information in a formal way. Whatever the system of interest (e.g. proteins, cells, organisms, ecosystems) there must ways to take information from one level of abstraction (e.g. proteins) and understand the relevant system's features of the abstraction layer above it (e.g. cell behaviours). This comes closest to a definition of systems biology put forward by Tony Hyman (Hyman 2011) but many others have defined it in vaguely similar ways, or maybe in similarly vague ways.

Power laws and the perils of searching for universal principles
When introducing systems biology I have been giving two examples of work that illustrate some of the benefits (network motifs) but also some of the perils (power law networks) of trying to find universal principles in biology. One of these examples was the research on the organisation of biological networks. As soon as different networks were starting to be assembled, such as protein-protein, genetic and metabolic networks, an observation was made that the distribution of interactions per gene/protein is not random (as studied by Paul Erdös). Most proteins have very few interactions while some rare proteins have a disproportional large amount of interactions – dubbed “hubs”.  Barabasi and many others had a series of papers describing these non-random distributions, called power-law networks (Jeong et al. 2000), in all sorts of biological networks. Analogies were drawn to other non-biological networks with similar properties and it is not an understatement to say that there was some hype around this. The hope was that by thinking of the common processes that can give rise to such networks (e.g. preferential attachment) we would know, in some deep way, how biology is organised. I will just say that I don’t think this went very far. Modelling biological networks as nodes and edges allowed the application of graph theory approaches to biology, which has indeed been a very useful inheritance from this work. However, we didn't find deep meaning in the analogies drawn between the different biological and man-made networks, although I am sure some will disagree.

Network motifs, buzzers and blinkers
Around the same time, the group of Uri Alon published very influential work describing recurring network motifs in directed networks (Milo et al. 2002; Shen-Orr et al. 2002). For example, in the E. coli transcriptional network they found some regulatory relationships between 3 different genes/operons that occurred more often than expected by chance. One example, illustrated to the right, was named a coherent feedforward loop where an activating signal was sent from an “upstream” element X to a “downstream” element Z both directly and indirectly via an intermediate third element. The observation begs the question of the usefulness of such an arrangement (Mangan and Alon 2003; Kalir et al. 2005). This has been generalised to studying the relation between any set of such directed interactions with specific reaction parameters – defined as the topology - and their potential functions. In a great review Tyson, Chen and Novak summarise some of these ideas of how regulatory networks can act, among other things as “sniffers, buzzers, toggles and blinkers” (Tyson et al. 2003).

These and other similar works showed that, within the complexity of regulatory networks, design principles can be found that encapsulate the core relationships giving rise to a behaviour. Once these rules are known, an observed behaviour will constrain the possible space of topologies that can explain it. This has led researchers to search for missing regulatory interactions that are needed to satisfy such expected constraints. For example, Holt and colleagues searched for a positive feedback that would be expected to exist for the switch-like dissolution of the sister-chromatid cohesion at the start of anaphase (Holt et al. 2008). This mapping between regulatory networks and their function can be applied to any system of interest and at any scale. The same types of regulatory interactions are used for termites building spatially organised mounds and for growing neurons seeking to form connections (as illustrated in a review by Dehmelt and Bastiaens). Different communities of scientists can come together in systems biology meetings and talk in the same language of design principles. This elegance of finding “universal” rules that seemingly explain complex behaviours across different systems and disciplines has been a great gift of systems biology. It is of course important to point out that such ideas have a much longer history from homoeostasis in biology and control theory in engineering.

Bottom-up network models
Alongside the search for design principles in regulatory interactions the formal mathematical and computational modelling of biological systems gained prominence (e.g. Bhalla and Iyengar 1999). Mathematical models are much older than systems biology but they started to be used more extensively and visibly with the rise of systems biology. Formalising all of the past knowledge of a system was shown to be a useful way to test if what is known was sufficient to explain the behaviour of the system. Models were also perturbed in silico to find the most relevant parameters and generate novel hypothesis to be tested experimentally. This model refinement cycle has been used with success for example in the modelling of cell cycle (Novak and Tyson 1993, Tyson Noval 2001; among many others) or circadian clock (Locke et al. 2005; Locke et al. 2010; Pokhilko et al. 2012). However, this iteration between formal modelling and experiments has not really taken off across many other systems. The reason for the lack of excitement is not clear to me although I have the impression that often the models are not used extensively beyond asking if what we know about a system sufficiently explains all of observed outcomes and perturbations.

Top-down systems biology and everything in between
From the start there has been a division between the researchers that identified themselves as part of the systems biology community. Bottom-up researchers have been focused on the formal modelling of systems, the discovery of design principles and emerging behaviours. Top-down researchers would argue that a truly comprehensive view of a system is needed. These scientists have been more focused on further developing and applying methods to systematically measure biological systems. The emphasis in this camp has been on developing generalizable strategies that can take large-scale observations and identify rules, regardless of the system of interest. I would say that these works, my own included, have been less powerful in identifying elegant universal rules. By this I mean, for example, those initial attempts to find common principles across biological and man-made networks. Instead of principles, what have been readily transposed across systems have been approaches such as machine learning methods. Drug screens with behavioural phenotypes, genetic interaction networks or developmental defect screens with gene knock-downs can all be analysed in the same ways.  Such systematic studies have driven costs down (per observation) and contrary to “representative” experiments in small scale studies, the large-scale measurements tend to be properly benchmarked for accuracy and coverage. 

What is still missing are ways to bridge the divide between these two camps. Ways to start from large-scale measurements that result in models that can be studied for design features. Studies that include perturbation experiments come closer to achieve this. Examples for network reconstruction methods have shown that it should be possible to achieve this but we are not quite there yet (Hill et al. 2016).

From systems biology to systems everything
As scientific movement systems biology started in cell biology, as far as I can tell, but has since then permeated many other areas of research. As examples, I have heard of systems genetics, systems neuroscience, systems medicine, evolutionary systems biology and systems structural biology. In 2017 we still face a flood of data and highly complex nonlinear systems. However, the reductionist approaches now typically go hand-in-hand with attempts to formalise knowledge in quantitative ways to identify the key relationships that explain the function of interest. In a sense, the movement of systems biology has succeeded to such an extent that it seems less exciting to me as field in itself. It is a fantastic approach that is currently being used across most of biology but there is less developments that alter how we do science. I am curious as to what other researchers that identify themselves with doing systems biology think - What have been great achievements of systems biology? What are the great challenges that are not simply applications of systems biology? Questions to think about for the (equally arbitrary) celebration of the 20 years of the field in 2020.

Friday, February 10, 2017

Predicting E3 or protease targets with paired protein & gene expression data (negative result)

Cancer datasets as a resource to study cell biology

The amazing resources that have been developed in the context of cancer biology can serve as tools to study "normal" cell biology. The genetic perturbations that happen in cancer can be viewed almost as natural experiments that we can use to ask varied questions. Different cancer consortia have produced, for the same patient samples or the same cancer cell lines, data that ranges from genomic information, such as exome sequencing, to molecular, cellular and disease traits including gene expression, protein abundance, patient survival and drug responses. These datasets are not just useful to study cancer biology but more globally to study cell biology processes. If we were interested in asking what is the impact of knocking out a gene we could look into these data to have, at least, an approximate guess of what could happen if this gene is perturbed. We can do this because it is likely that almost any given gene will have changes in copy number or deleterious mutations given a sufficiently large sample of tumours or cell lines. Of course, there will be a whole range of technical issues to deal with since it would not be a "clean" experiment comparing the KO with a control.

Studying complex assembly using protein abundance data

More recently the CPTAC consortium and other groups have released proteomics measurements for some of the reference cancer samples. Given the work that we have been doing in studying post-translational control we started a few projects making use of these data. One idea that we tried and have recently made available online via a pre-print was to study gene dosage compensation. When there are copy number changes, how often are these propagated to changes in gene expression and then to protein level ? This was work done by Emanuel Gonçalves (@emanuelvgo), jointly with Julio Saez-Rodriguez lab.  There were several interesting findings from this project, one of these was that we could identify members of protein complexes that indirectly control the degradation of other complex subunits. This was done by measuring, in each sample, how much of the protein abundance changes are not explained by its gene expression changes. This residual abundance change is most likely explained either by changes in the translation or degradation rate of the protein (or noise). We think that, for protein complex subunits, this residual mainly reflects degradation rates. Emanuel then searched for complex members that had copy number changes that predicted the "degradation" rate of other subunits of the same complex. We think this is a very robust way to identify such subunits that act as rate-limiting factors for complex assembly.

Predicting E3 or protease targets

If what I described above works to find some subunits that control the "degradation" of other subunits of a complex then why not use the exact same approach to find the targets of E3 ligases or proteases ? Emanuel gave this idea a try but in some (fairly quick) tests we could not see a strong predictive signal. We collected putative E3 targets from a few studies in the literature (Kim  et al. Mol Cell Biol. 2015; Burande et al, Mol Cell Proteomics. 2009; Lee et al. J Biol Chem. 2011; Coyaud et al. Mol Cell Proteomics. 2015Emanuele MJ et al. Cell 2011). We also we collected protease targets from the Merops database. We then tried to find a significant association between the copy number or gene expression changes of a given E3 with the proxy for degradation, as described above, of any other protein. Using the significance of the association as the predictor with would expect a stronger association between an E3 and their putative substrates than with other random genes. Using a ROC curve as descriptor of the predictive power, we didn't really see robust signals. The figure above shows the results when using gene expression changes in the E3 to associate with the residuals (i.e. abundance change not explained by gene expression change) of the putative targets. The best result, was obtained for CUL4A (AUC=0.59) in this case but overall the predictions are close to random.

A similar poor result was generally observed for protease targets from the merops database although we didn't really make a strong effort to properly map the merops interactions to all human proteins. Emanuel tried a couple of variations. For the E3s he tried restricting the potential target list to proteins that are known to be ubiquitylated in human cells but that did not improve the results. Also, surprisingly, the genes listed as putative targets of these E3s are not very enriched in genes that increase in ubiquitylation after proteasome inhibition (from Kim et al. Mol Cell. 2011) with the clearest signal observed in the E3 targets proposed by Emanuele MJ and colleagues (Emanuele MJ et al. Cell 2011).

Why doesn't it work ? 

There are many reasons for the lack of capacity to predict E3/protease targets in this way. The residuals that we calculate across samples may reflect a mixture of effects and degradation may be only a small component. The regulation of degradation is complex and, as we have shown for the complex members, it may be dependent on other factors besides the availability of the E3s/proteases. It is possible that the E3s/proteases are highly regulated and/or redundant such that we would not expect to see a simple relationship between changing the expression of one E3/protease and the abundance level of the putative substrate. The list of E3/protease targets may contain false positives and of course, we may have not found the best way to find such associations in these data. In any case, we though it could be useful to provide this information in some format for others that may be trying similar things.  

Friday, January 13, 2017

State of the lab 4 – the one before the four year review

It has been 4 years since I started as a group leader at the EMBL-EBI (see past yearly reports – 1, 2 and 3). This year the group composition has been mostly stable, with the exception of interns that have rotated through the group. We had Bruno Ariano (twitter) visiting us for 6 months working on a project to build an improved functional interaction network for Plasmodium. Matteo Martinis has joined the group for a few months and is working with David Ochoa on comparing in-vivo effects of kinase inhibitors with their known in-vitro kinase inhibition effects. Finally, Areeb Jawed has joined Cristina and Bede, for some months, in their efforts to develop genetic methods to study protein modification sites. I think we had a great year in terms of publishing and I had the luxury of not trying to apply for additional funding. That luxury is short lived as we have funding that is ending this year that I will try to replace.

The one before the four year review
All EMBL units are evaluated every 4 years by a panel of external reviewers. The next review for the research at EMBL-EBI is coming up now in March and we have been preparing the required documentation for this. Naturally this forced me to think about what we have achieved as a group for the past 4 years and what we aim to do for the next 4. It is impossible not to go through this process without being drawn into some introspection and without comparing our performance with that of those around me. I think we did well in this period of time, we got two significant grants funded (HFSP and ERC) and published some articles that I feel have been significant contributions towards the study of kinase signaling. I remember my interview for this position when they asked me what I would expect to achieve in the next 5 years. My first though was: “Really ? That question ?”, but I think we did achieve we I had hoped at the time. Still, at EMBL-EBI we are surrounded by some fantastic colleagues that keep the bar really high. It is hard to be satisfied and I am certainly motivated by our research environment to try to help our group to keep up the good work. This review will also determine if our group receives a 4 year extension after the first 5 years. I am confident but still apprehensive and curious about what the reviewers will say.

Studying cell signaling states using phosphoproteomics

During the past four years we have worked on several aspects of kinase based cell signaling. I mentioned before our work on trying to describe the evolutionary history of protein phosphorylation (blog post) and to predict the kinase specificity from interactions networks and phosphoproteomic data (blog post). I haven't described yet our work on studying cell signaling states that has been published a few months ago When David Ochoa started in the group around 3.5 years we reasoned that, by collecting information on how phosphosites abundances change across a large number of conditions, we would be able to use the profile of co-regulation across conditions to learn about how cell signaling systems work. This is copying by analogy what has been done in gene expression studies since Mike Eisen's work. David made use of published conditional phosphoproteomic studies to compile a very large compendium of different conditions. There are issues related to the incomplete coverage of mass spectrometry measurements and potential batch effects of the different studies. David tried to work around these, primarily by focusing the analysis on groups of phosphosites (e.g. targets of the same kinase) instead of individual positions. Using this data he derived an atlas of changes in activities for around 200 human kinases across nearly 400 different conditions. We show in this work how this can be used to advance our knowledge of kinase signaling (Ochoa et al. Mol Sys Bio 2016, and the phosfate webserver).

For me this work was the fist time we could measure a large number of cell signaling states. To see what is the structure of this state-space and what we can learn from this. What kinases are most often regulated ? What kinases define particular signaling states ? What states act as “opposing” states and how may we use this information to promote or inhibit specific states or transitions through the state-space ? These are all questions that we can address with this atlas. The fact that the data was collected from different publications, using different protocols and machines will certainly have an impact on the accuracy and resolution of this atlas. However, the quality and coverage and these types of experiments will only improve and I think this direction of research will continue to be exciting for long period of time. 

Since this work we have also tried to benchmark different approaches to predict the changes of kinase activity from phosphoproteomic information (preprint). In collaboration with Julio Saez-Rodriguez's lab we also used some of the same concepts to relate the changes of metabolism with predicted changes in kinase, phosphatase and transcription factor activities (Gonçalves et al. PLOS Comp Bio 2016).

Onto the next four years
If I do get my contract extension, we will continue our current main research focus on studying cell signaling  through the next four years. Although we will certainly continue to study long term evolutionary trends, such as the evolution of kinase specificity, we will complement this with trying to understand the impact of genetic variation for individuals of the same species with a strong focus on E. coli, S. cerevisae and H. sapiens (mutfunc). We have started to make use of cancer data as genetic resource to study human cell biology (preprint). We won't necessarily try to study cancer as a disease but I think that datasets for primary tumors and cancer cell lines are amazing resources to learn about how human cell biology and cell signaling work. The group will have its first big turnover of group members over the next 1 or 2 years which will be challenging professionally and personally. However this turnover will also allow for and shape future directions of the group which will also be exciting.