Cellular Consequences of Genetic variation: academia

Showing posts with label academia. Show all posts

Sunday, March 16, 2025

State of the lab 12 - Becoming an established scientist

This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes year 12, the 3rd year after moving to ETH Zurich. In the last blog post I wrote down some of our overall research directions for the first 5 years of the group at ETH and I will wait another year or two before reflecting back on those commitments. This time, I wanted to try to write down some thoughts I have been having about essentially becoming more established in academia. This includes a longer term perception of group turnover, the time and resources needed to achieve research objectives and some activities that go beyond the management of the research group.

Group member turnover cycles

With 12 years of managing a research group, I have gotten used to some of the broader rhythms of turnover of the lab. Our lab is now almost totally renewed with just 1 lab member that came with the lab from EMBL. While this turnover was somewhat enforced by the move from EMBL to ETH, the turnover of lab members is a constant in academia given the short term nature of the lab members’ positions. In our group PhD students have typically stayed for around 4 years and postdoc have typically stayed for up to 5 years. Since there is some degree of clustering of the hires there tends to be some periods of higher turnover. We have had something like 2 to 3 periods where the lab has seen a large change. In the group, I try to hire from diverse backgrounds (e.g. biology, CS and math) and we work with a range of experimental and computational approaches, including for example yeast genetics, proteomics, structural bioinformatics, machine learning, etc. This creates a nice dynamic of group members building up their projects, while at the same time learning about the capabilities of the rest of the lab. The projects are usually meant to be somewhat synergistic, trying to address bigger goals from the individual problems (see past blog post on this). This means we have had windows of around 3 years when things click together before the turnover starts again. We are just around that exciting stage in the cycle and I am really looking forward to making the best of it. I still don’t enjoy what comes next, when the group will inevitably turnover again. I have accepted that it is an opportunity to steer the ship into new directions but sometimes it is disappointing to change the group just around the time it feels like we can take on almost any challenge.

Longer term view of science

One thing that has been on my mind is that I am sometimes weary about the time it can take to achieve a research goal. I am not talking here about an individual research project which tends to take on the order of 2 to 3 years on average. In our group we have tried to address some bigger research goals, such as trying to understand the evolution of protein phosphorylation or the functional relevance of individual phosphosites. These kinds of challenges take multiple independent projects and over 10 years of time to make a meaningful dent on. These days I will look at a potential long term research goal and I will think about the many different types of methods and steps that will be needed and this can distract me from the excitement of figuring those things out. I should say that I am by no means jaded about doing research. I still get such a thrill discussing the day-to-day results with lab members, being at the frontier and trying to figure things out. It is just when I pause to think about the longer term view, either in the past or trying to project into the future that I sometimes wish things could just move faster. I have taken part in a couple of large multi-PI projects that have moved very quickly and from these I can see the temptation of trying to have large labs.

From junior to “established” PI

There is no point in time when a switch happens and someone is no longer considered a junior PI but after 12 years I can safely assume that label no longer applies to me. This has brought some relatively small changes in my job, one simple one being that I no longer think about tenure. For most of my career I was on fixed term positions, including my first group leader position at EMBL which had a time limit of 9 years. I joined ETH 3 years ago on a tenured contract and not having to think about my next job has left me with a tiny post-tenure slump - what am I aiming for ? Related to the previous section, I have considered that I could enjoy overseeing science at a higher level than as a group leader. As one example, I organized an application for a National Centre of Competence in Research (NCCRs) with 19 PIs interested in human genetics in Switzerland. While the application failed, I was really keen and excited to co-direct the center if it had been funded.

Another aspect of my job that has changed somewhat is a higher commitment to activities outside the lab, such as taking part in committees, advisory panels or formal and informal mentorship of junior PIs. I don’t feel particularly overwhelmed by these activities but that might change if I am required to take part in more committees within ETH. Not everything is an additional burden to an already busy job. I have felt that being more visible and connected in international science comes with benefits, including being easier to at least discuss collaborations or having labs interested in joint grant applications.

Scientists that have worked in academia for longer than I have might find some of these things funny and I am certainly curious about what it will feel like reading this 10 years and more from now. In fact, the blog is now a bit over 20 years old with posts starting in my PhD. While I don’t post much these days I aim to continue at least this yearly series while I feel there are some new things to say beyond the progress in our science.

Monday, November 13, 2023

State of the lab 10 and 11 - the first years at ETH Zurich

Yet another lake by a mountain in Switzerland

This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes years 10 and 11, the first 2 years after moving to ETH Zurich. It also marks the end of the first decade as a research group leader, which is meaningful only because we have ten fingers and use 10 as a base for counting but I digress. There has been a lot to adapt to in moving to a new country including all the basics of moving, re-building the group and starting teaching. It was a lot easier than the first time around since I didn't have to set up the group from zero. Some people came with me, some stayed at EMBL-EBI with funding that couldn't be moved and generally speaking we could continue several computational related projects without much interruption. If we were primarily lab based then I think the interruption would have been more dramatic. Unexpectedly, there were more periods of high stress than I typically have. There was no particular reason for the stress but just a combination of multiple small things and probably due mostly to the adaptation to a new place. I will cover here some of the biggest things I am having to adapt to and also some of the research directions planned for the first 5 years of the group at ETH. One aspect that I will not cover is networking and getting to know the Swiss research landscape, but I will come to it in a later post.

The Swiss style of leadership

The EMBL, where I was before, has a very top-down leadership. EMBL is funded by different counties that are represented in the EMBL council. There is a director general who is appointed by the council and has a lot of control. Of course, there is a hierarchical support structure with a senior management team, heads of research units and a group of "senior scientists" that support the director in decision making. I am still figuring out ETH but there is a very different feel to it, both in size and style of leadership. EMBL employs around 2000 people while ETH has around 12,000. Organizationally, ETH is divided into 16 departments, and each department is further split into different institutes. For example, I am in the Department of Biology, which has 6 institutes, and I am in the Institute of Molecular Systems Biology (IMSB). As leadership, there is an executive board, including the president of ETH, then the Department heads, and in each department there is the meeting of heads of institute and the professorial conferences (i.e. all votes from professors). At least in the Department of Biology the heads of the institutes and the leadership of the Department are meant to rotate every 2 years. At these levels - institute and department - the leadership feels highly representative with lots and lots (!) of voting. This representative rotational leadership feels very different from EMBL and I think mirrors more broadly a Swiss way of doing things. The obvious consequence of this is that any change requires deep consensus and therefore radical change is less likely but it is too early to say much more.

Teaching at undergraduate level

During 9 years at EMBL I had almost zero teaching duties. I voluntarily taught some classes in the GABBA PhD program in Portugal and not much more. At ETH teaching is now an important part of my job. I am teaching courses in Bioinformatics and Systems Biology, primarily to biology students, which are all very familiar topics and close to my area of research. I don't particularly enjoy the act of teaching, in particular standing in front of 70-100 students and trying to explain things. As an introvert I am more comfortable with 1-on-1 or small group discussions and I get very tired with the interaction of teaching in a classroom setting. I have always said that Biology students should learn more computational skills so at least I have the opportunity now to influence that at ETH. In fact, the biology curriculum was changed right when I was joining to add more bioinformatics and they do have the chance to learn it with multiple lectures that cover bioinformatics and machine learning. Despite it being a mixed bag for me I am privileged in that I have a very low teaching load in topics that I like. Teaching is an area that I feel I could do more for and it could have an impact, in particular if we made it open to anyone. However, it is still something that I find difficult to fully devote to given the research role.

Our research at ETH during the first 5 years

The start of the research group at ETH has been fantastic. There was another big turnover of the group members during the transition, the second major turnover since the group started 11 years ago. I am really happy with the team we have here and having done this sort of turnover before, I can already see the growing potential of many projects that have started here. So the next 2-3 years is going to be about building up these projects and trying to coordinate them such that they interact and feed off each other. We have very generous stable funding as all other tenured prof positions at ETH - so called endowed professorships in the US or positions with core funding for the European researchers. Surprisingly, there is not a lot of oversight on this research funding which is a big difference from EMBL where the units, and their group leaders, are reviewed every 4 years. So I thought I could at least write down our commitment for research over the first 5 years here, in the spirit of disclosing what we are doing with this public research funding.

Human genetics research - mechanisms linking genotype to phenotype

Human genetics is an area that we started working on in the last 3-4 years or so of EMBL. Some of these things are already visible in recently published articles, including some protein-interaction network-based analyses of trait-associated genes. We continue to actively work on this and one direction of focus is to try to build interaction networks that are specific to different tissues or cell types. We are working on a manuscript on this and it is an area to continue to build upon, to be able to study the differences in cell biology of different cells/tissues and how genetic changes manifest differently in these. A second direction of focus here is to study the relation between common and rare variants linked to related traits using networks.

From cells to proteins - we are finishing a project where we are using protein structures to annotate functional residues in proteins to study mechanisms of pathogenicity. One aspect of this that will need further development is expanding on the prediction of structural modelling of protein interactions with other proteins and other molecules. Finally, we are interested in how genetic variation controls protein levels and ideally how to build computational models that can integrate the impact of genetic variation through control of protein levels, interactions, organs and organismal traits, ideally without a black-box modelling approach. All of these things are actively ongoing and I expect to have progress to report in the coming years.

Post-translational regulation - large scale studies of kinase signalling

There are over 100,000 phophosphosites discovered in human proteins and over 20,000 found in budding yeast proteins. We don't have good methods to study the functional role of these phosphosites nor to reconstruct the kinase/phosphatase-substrate signalling network of different cells. About half of the group is continuing to work on these problems and here at ETH we managed to consolidate the computational and experimental parts of our group which used to run in different locations while I was at EMBL. Because we are doing more of the experimental work now, this part of the group had a slower start but things are now moving along very well. Some of the problems that we are working on include the prediction of the biological process regulated by phosphosites; studying the impact of phosphorylation on protein conformational change; experimental methods to map kinase-substrate interactions and large scale mutational studies of PTMs. The thought has crossed my mind to phased-down a bit this area of research, or at least to move more into mammalian systems in our experimental work to make it more complementary to the human genetics side of the lab.

Structural bioinformatics, protein evolution and other

We have been having a lot of fun with AlphaFold2 ! With the current fast pace of change in protein related bioinformatics methods I am sure we will continue to play with these methods as they come. It is not likely that we will do a lot of method development ourselves, it is not our way, but I think we are very good partners for method developers to help make the bridge to applications. Protein structures, protein design and evolution models are all things we will likely be playing around with in the coming years.

Wednesday, January 19, 2022

State of the lab 9 - an informal report on the 9 years of EMBL-EBI

This blog post is part of a yearly series (or close to yearly) on running a research lab in academia. 2021 was the last of 9 years as a group leader at EMBL-EBI, which is the standard time given to group leaders to establish and run their labs at EMBL. For this year's blog post I thought it was a good time to look back at the full 9 years and I am going to (briefly) cover the time at EMBL with some numbers including giving an approximate account of the finances. This is something that I do with the group at the start of every year but it still feels strange to make financial numbers public.

The scientists

A lot has happened during 9 years. Starting with the people, we have had 7 PhD students, 1 of which co-supervised, 13 postdocs and 10 interns/visiting lab members. The total group size was around 10 for the majority of the time which, as a manager, feels about right in what I can do as a direct line manager. It is fair to say that science is a very social activity and working with different people with different personalities, through the good and bad, is really enriching. Not to get all corny but the personal interactions are some of the things that stick with me the most over the time. It is always in those extremes - the "unfairly" rejected paper or unexpected positive response, individual personal and work difficulties that are overcome or sometimes not. Mental well being is an example of such difficulties that across the broader society we are not good at dealing with and that have also not always been easy as a manager.

From these 30 lab members there are 7 that will continue with the group over the next few years: Cristina (senior scientist), Jurgen (postdoc) and Miguel (postdoc) have joined me at ETH and Eirini (PhD student), David (postdoc), Inigo (postdoc) and Danish (postdoc) will remain at EMBL-EBI with funding that cannot be moved. From the PhD students and postdocs that have left all but 2 have left with published papers as first or co-first authors. One PhD student decided not to continue the PhD and one postdoc left after several years without a first author paper. In both cases I feel some blame as the project ended up being difficult and the results were just not very positive.

The publications and science

In total we published 45 original research papers, 3 review articles and 2 news&views over the course of 9 years. This includes only research that was really done after starting the group and also includes 8 preprints that have not yet been published in a journal after peer-review. This is split into 27 papers where I am listed as co-corresponding author and I also think our group played an important role in the final outcome, plus 18 on which our group had some input into. I am showing on the figure the distribution of these papers along the 9 years. The first paper from our group only came at year 3 with the first real significant set of publications coming at year 4 and 5. In regards to the non-tenure track system, even by this crude metric it is easy to see how different it would be if I had to apply to the job market at year 6-7 vs year 8-9. Of course, note that the numbers for 2021 in particular are inflated by preprints that will ultimately be published in a journal most likely in 2022. Another clear trend that feels true to me is the increase of small collaboration efforts where our group just helped out in some modest way. I think this is a reflection of just being more integrated into the local and broader academic networks.

I am not going to go into the scientific outcomes of the 9 years in any detail. I think some of the strongest work we did was on the evolution and functional importance of protein phosphorylation with multiple publications that have built on each other and where I think our contributions move this field forward. There was also a smaller line of research on the genetics of trait variation that I wouldn't consider to be at the cutting edge but it has been fun to work on. In particular it has been interesting to step closer to the fields of human genetics and genetics of human disease where making advances requires the interactions between people with such different ways of viewing science. Just the language barriers between human genetics, cell biology, biochemistry and chemical biology have been fascinating to get into.

The funding

So now something that feels less comfortable or at least less common to discuss - the funding. Before going into any numbers, I should caveat this by saying that these are very rough approximations that of course should not be considered an actual financial statement. These numbers also don't take into account the money spent on the whole infrastructure (administration, grants, IT, etc) but are just the funding spent on research lab members, including my salary, and consumables. With that out of the way, over 9 years we spent approximately 5.7 million euros as broken down per year on the figure. Although we have had a small wet lab running in the last 6 years, I would say that 90% of this was on salaries. Of these around 2.7 million were from external grant funding, plus ~450k from competitive internal postdoc fellowships. This of course just shows how amazing it is to work in a place with core funding. I ended up being very successful early on with 2 million funded in years 2 a 3 and this made me too careless about applying for grants later on which I now consider a real error on my part. I applied in total to 13 external grants with 6 being successful.

So a number that immediately is easy to get but that is probably quite meaningless is the money spent per research paper. We spent a total of ~127k euros per paper or 210k if we only count those where I am listed as co-corresponding. Of course this varies so much per paper really with my very rough estimates on bounds to be something like between 25k to 1 million. Given that we mostly spend the budget on salaries this simply reflects the amount of people time spent on a project.

To new beginnings

This is a somewhat dry recap of the 9 years of EMBL but I thought it would be interesting, at least to me, to have these things written down. Even if these are just numbers, I am curious to see what the next 9-10 years look like. I am sitting in my new office at ETH, just close to two weeks after arriving in Zurich. There is a lot to adapt to, including teaching material that I should be preparing right now. I am curious to see how long it will take me to get into the local academic network and how much the move will impact on our capacity to do work. The lab work is really the part that will take the longest as I don't think we will run any experiment before middle of the year and although we have the budget for an MS instrument that will take even longer to get going. In any case, I am excited about the new beginning here.

Friday, May 21, 2021

Lab move to ETH Zurich, the job search and fixed term PI positions

ETH Zurich (credit)

Next January, after 9 years at the EMBL, I will be joining ETH Zurich as a tenured faculty of the Department of Biology with my research group hosted at the Institute for Molecular and Systems Biology (IMSB). I am really excited about this move and I think the IMSB is a perfect fit for the type of research that we do. We primarily use computational approaches to study the relation between genotype and phenotype with a specific focus on post-translational regulatory systems (more on the EBI website or my GScholar page). IMSB has a long tradition of method development in large scale measurements of biological systems with a current interest in mechanistically explaining trait variation. The smaller experimental component of our group uses yeast genetics which is also a great fit for the groups around including our future neighbours in the Institute of Biochemistry. Research wise the group will remain focused on: studying the evolution and functional importance of post-translational regulation; determining the regulatory networks of a cell, and how they change under different conditions including disease. More broadly we also study the mechanisms that underlie trait variation across individuals of the same species. In terms of methods it will remain primarily computational with around 30% of the group devoted to lab work. The lab will be fully equipped for large scale yeast genetics with the exciting addition of having funding for a MS instrument for the proteomics.

Teaching, scientific integration and group structure

With any move there is always some thoughts about the challenges ahead. Professionally, the types of things on my mind are that I will need to setup the group, integrate myself scientifically and prepare myself for teaching. Setting up the group and integrating myself within the local environment won't be new experiences. I feel I was too slow with both of these things when I first joined EMBL-EBI so I am curious if I will be able to move things along faster this time. Coming from EMBL and the local EBI/Sanger campus I have the impression that ETH is less collaborative but there were clearly many people interested in collaborating just from the small sample I got during interviews. There is an interesting difference in group structure between EMBL and ETH where at ETH a group can have sub-groups with junior PIs that can have varying degrees of independence as per the decision of the more senior PI. Organising a lab in this way will be something new. Finally, I will have to teach at the undergraduate level for the first time. I have always said that students coming out of biology or related topics need to have better training in bioinformatics. While daunting this will be my chance to contribute to this training directly.

The interview process and decisions

For those less familiar with the EMBL, group leaders are hired for a maximal period of 9 years with only a few exceptions (around 10%) that end up having an open-ended contract. We get generous core funding and get to tap into a great scientific network which more than compensates for the lack of tenure. This means that around year 7 your thoughts start moving into the future. At faculty presentations I would often write how many years I had left in the tittle slide as a personal reminder. Towards the end of year 7 I started applying and spent most of year 8 applying and interviewing. The first time I applied for PI positions it was all very unidirectional, with myself looking broadly for possible places. This time it felt more like dating a potential future university/institute with expressions of interests on both sides. One of the issues in going into this is that I didn't really know what my value would be in the market. I knew I had a good CV and would certainly find a job, I just didn't know where I could aim for in terms of seniority and resources. That become clearer only after the first interview and the expression of interest of places I felt were really fantastic.

The second half of 2020 became then about trying to find the best place professionally and personally. I ended up applying to 10 places, interviewed in 8 and received 5 offers. I tried to find a job in my home country (Portugal) but from the two places I was interested one picked another candidate and the other could not make an offer that was not fixed term. The decision ended up being among 3 places with the major differentiation factor being between 2 offers that had less core funding but higher management responsibilities and ETH with incredibly generous core funding and the best scientific fit (but less seniority). Personally the decisions were about staying in the UK or moving to France or Switzerland. There is quite a lot to be said about this choice (safety, adventure, integration, kid friendly, jobs for partner, etc) and in the end we went with Switzerland. While excited I am also anxious about yet another move to what will be my 5th home country, the now almost familiar sense of uprooting and new beginnings. But this is not yet time for goodbyes.

Non-tenure group leader positions (in Europe)

I don't know who invented the fixed term, non tenure track, group leader positions in academia. It may have been EMBL and this model has clearly spread across Europe with many research institutes having some form of junior positions that have a variable number of years (5 to 12) to set up a group and then necessarily need to move on to a different place. EMBL does this because it is funded by many member state countries to train the next generation of "academic leaders" that will lead research groups across the member states. The obvious advantage of hosting these positions is that it keeps the institute forever young if you manage the turnover well. I think these positions can work well if they remain a relatively small proportion of the total PI/faculty positions; there is some level of support to at least kick start the group; and the positions last a sufficient number of years. Having gone through this at EMBL my impression is that 7 years would be the bare minimum and 9-10 years would be ideal. This also depends on the level of support beyond the PI salary. If conditions are not met then it is not worth setting up people for failure with the selfish goal of using the higher turnover to bring in new ideas/methods. Don't give people super postdoc positions for 3-5 years with no funding and no chances of tenure just because you want fresher ideas around. If there is some mechanism for tenure or open ended contract then it should be crystal clear from the start how (un)likely this is and what are the transparent criteria for achieving it.

Friday, December 04, 2020

A year of SARS-CoV-2 research

This post may be premature but I feel like writing down some thoughts about the roller coaster that this year has been. At the start of the year, with the number of reported cases rising in Europe the EMBL and our institute (EMBL-EBI) decided to send everyone home as precautionary measure. As most of our group is computational, this has meant we have been working from home for most of this year. Early on, somewhat frustrated by not being able to help, I emailed a few people that could be working on the virus. Nevan Krogan replied saying our help would be useful and we joined the global effort to contribute to solving this crisis.

Science at science fiction speed

Over the course of 9 months we took part in 4 projects, some of these being the most thrilling science I have ever taken part in. We condensed what would easily be a 3 to 5 years research project into something done in 3-4 months, involving typically 10-20 research groups with a few key people helping to direct the research. We were collecting data, analysing and suggesting new experiments in the span of days with some of the best scientists in the world. Contributing to the direction of this level of resources has been an amazing experience that I wish every scientist could try at least once in their life. These projects were all geared towards studying how SARS-CoV-2 takes control of its target cells to be able to suggest human targeting drugs that could counter the infection. Several of the compounds identified in these studies are in clinical trials for COVID-19 so I feel the projects met their main objective.

While this has been my perspective from working on these specific projects we are all aware of the amazing scientific progress that has been made over the course of this year. I remember seeing the movie Contagion and almost laughing at the unrealistically fast pace of research in the movie. However, SARS-CoV-2 research has in fact happened at an incredibly fast pace that probably matches the movie.

Why don't we do this for disease X?

One discussion point that has come up often is if we can learn from this period to apply it to research into other diseases. Science is an international endeavour but the degree of collaborations for SARS-CoV-2 research has been higher than usual. The effort put into this was also high among the projects I have seen personally but this eventually results in some exhaustion and it is not sustainable. I don't think this is easy to repeat for other diseases without the same external sense of urgency. Most scientists won't just drop what they are working on to fully focus on some other research question. Maybe it is an argument for even higher degree of collaboration, in particular between academia and biotech/pharma. There may be some small increase in productivity of collaborations through the use of online tools like slack and zoom but overall I don't see that the way we do science has been dramatically changed going forward.

The case for higher spending in research

I'm gonna have to science the s**t out of this

Jeremy Farrar has often said that science is our exit strategy for this crisis. From testing, tracking the spread, to treatments and vaccines. It is this single minded effort of so much of the worlds research capacity that will lead to a long lasting solution. This already looks to be within reach with some treatment options, new ways of testing and critically, what appear to be effective vaccines. Soon enough we will be looking back and asking ourselves if there is something we could have done better. As trained scientists our reflex is to pause and think carefully about all the things that could have worked better. Were we efficient ? Did we deal well with the deluge of studies ? Was the peer-review too shallow and quick? It is our instinct to be critical but maybe we should be more vocal about how amazing the response of the scientific community has been. More importantly, this is the time to demand higher funding rates. If society can't see how important science is during a pandemic, when are we going to make our case ? This is the capacity of a research infrastructure that is funded by 1-2% of national budgets, what could humanity achieve if we were to double it ?

Over the last 10 years academic science budgets have been squeezed and a lot has been said about how academic science needs to be more applied and how much we should justify the investment it is being made. This week, DeepMind, a private research institute funded by what is essentially an advertising company (Alphabet/Google) has made headlines with their impressive research into predicting the structure of a protein from its sequence. An advertising company finds the money to invest into what are fundamental biological problems and in the middle of a pandemic that is being solved by a global scientific infrastructure we can't get the EU science budget to increase. We should be ready to make our case over the course of the next months.

Monday, June 26, 2017

Building rockets in academia - big goals from individual projects

SpaceX just launched and landed another two rockets over the weekend. I don’t get tired of watching those images of re-entry and landing. The precision is mesmerizing and extremely inspiring. Leading a research group in academia I often look at research intensive companies and wonder about the differences and similarities between how research is done in both. I have never worked in such a company environment so these thoughts are certainly from the perspective of academia.

The big goals and peripheral bets

From reading about big tech companies and start-ups I can relate to how they appear to organize their product portfolio into a small number of main goals – their core product(s) – while at the same time experimenting with peripheral goals/products. Tesla started as a car company but may end up being a large battery company with small side of car manufacturing. As another example, most major tech companies are today experimenting with virtual reality. In these experiments, those involved face similar questions about uncertain outcomes and timeliness of their steps as we do in academia. One of the thrills in academia is that leap into the unknown where it is crucial to ask the right question just at the right time. The speed of progress in research can be very uneven with times spent floundering in the dark and times where you just happen to walk in the right direction and find big riches. Sometimes those explorations will lead you to unintended directions, away from your core research, where it might be worth moving additional resources. Aiming in the right direction at the right time is a rare skill that a researcher must have but that we don’t spend enough time training for. Also, the balance between focusing on the core and exploring other areas of interest is difficult to set. In academia it seems easier to obtain funding to keep working on your core than to move to new areas. I wonder how companies deal with these issues. I am extremely thankful to be working in a research institute where I get core funding that, although I have to justify, I get to use to explore ideas outside the core of what we do. Such flexibility could be a bigger part of how research funding gets distributed.

Individualized contributions to group goals

While setting a big goal and exploring peripheral objectives might have a lot in common between academia and companies, there is one aspect of how we work that appears very different. In setting the big overarching questions we have to accommodate the fact that each individual group member will have to stand out. PhD students are working on their theses and postdocs are building the work on which they will stand as future group leaders. Each project has to brilliantly stand on its own while simultaneously fitting together with other group projects, contributing to an even greater goal. As each research project can be an unpredictable grasp in the dark, as a group leader I feel like I have to be build an alluring house of cards. Projecting how several research projects might move forward and create an illusionary image of how they fit together to solve THE big question. Not only will we build the rocket that will save mankind but every single contribution from each team member has to solve an important problem. It is obvious that the overarching goal will have to shift with time as some projects move to their potential unintended outcomes. In the context of being flexible to follow peripheral bets, maintaining the big picture goal may be challenging. I would not be the first to propose more career tracks in academia where professional researchers don’t have to move into management roles to keep working in academic science. It would be interesting to try it out on some research institutions to see the effect it would have on how research agendas would be organized.

Monday, January 11, 2016

State of lab, year 3 - the first group outcomes

Lab poster made by Omar for the EMBL lab day

This is the third blog post of what I hope will be a very long series. Even in just three years it is fun to go back and read the past yearly entries (year 1 and year 2). I am sure I will enjoy reading back over 5 and 10 of these yearly reports. This report marks the end of the third year of the lab. I have to stop thinking of how quickly a year goes by. We will have a review in March 2017 that will likely dictate our extension after the first 5 years and if extended the group has then an additional 4 years before having to leave the EMBL-EBI (after a maximum of 9 years).

During the third year we said goodbye to Juan A Cordero Varela (master student, linkedIn). Marta Strumillo, that was doing an internship, stayed on to do her PhD in the group. Towards the very end of last year we were joined by two additional postdoctoral fellows, Bede Busby and Cristina Vieitez. As I had mentioned last year they will be working at the Genome Biology unit in Heidelberg in a close partnership with Nassos Typas' lab. Bede and Cristina are setting up yeast genetics methods to study protein modifications. This year I also started a blog series on our group members and I will try to get everyone to participate.

Group size and grant applications

At least for one year I have let myself apply for fewer funding opportunities. The group has now 12 members with one additional person joining this March. I am not sure what is the best strategy to manage the size of a group. Most grants and fellowships have very low success rate (10% to 30%) and if the objective is to maintain a specific group size then one would have to be very lucky to get just enough funding to stay at steady-state. I suspect that many group leaders just keep applying to all available funding and let the group size increase and collapse according to the success of the applications. I would be curious to hear from others what their thoughts are on this. My current impression is that somewhere between 5-15 people is a manageable and efficient group size but does anyone limit growth to stabilize group size ?

To be, or not to be, an experimental group (revisited)

Cristina and Bede at the visitor
lab space in EMBL-EBI

As described in the first year report, we don't have lab space at the EMBL-EBI. To be able to have access to lab space my initial solution was to co-supervise group members with experimental groups. This has been useful, particularly in creating closer collaborations with some of the groups involved. Haruna and Brandon have worked with Jyoti Choudhary to have access to mass-spec instruments. Sheriff has been working in London in the lab of Silvia Santos where he has contributed to some microscopy experiments and Marco spent some time in Nassos Typas' lab learning how to do chemical genetic screens. In all of these projects the group members are spending >50% of the time analysing the data. Bede and Cristina will be the first group members that will be primarily dedicated to experimental work, although I am sure they will also have an opportunity to further develop their computational skills. So far, these arrangements have been working out scientifically. However, I am now sure that, when I move out of the EMBL-EBI, I will aim to have access to lab space.

Projects as science, stories and publishable units

As I had mentioned in the second year report, I am no longer working on a research project myself. I had two periods of time last year where I emptied my to-do list but it didn't stay down long enough to be able to pick up a project. I am more at ease with the management role in the sense that I have convinced myself that it is actual work. It took me a while not to feel guilty about just doing management tasks. It is actually great to be able to help guide the flow of the projects of all of the lab members. From the inception, through the initial stumbles, turns in direction, building up the promising results, up until there is enough progress to be worth communicating it. This also means deciding to quit an idea when the research direction is no longer promising. In this process of managing a large set of projects I have felt a very clear temptation to focus on the publishable units as the outcomes. Although science is nothing if not communicated there is a risk of losing track of the priority of moving science forward. Asking questions and gathering evidence happens always in a scientific context. This context or story is also important for properly communicating your results to others. The problem is when the focus shifts too much into thinking about what are the experiments that are needed to write a paper instead of what are the best experiments to answer the scientific question at hand. These two things are hopefully aligned but the publishable unit should not be the goal in itself.

The first group outcomes

In the past year I finally managed to publish the last papers still involving my postdoctoral lab. The two articles reflect the two strands of research in our group. One paper describes a set of phosphorylation sites collected for X. laevis and an analysis of its conservation and structural features. We found that the degree of conservation of phosphosites and putative kinase-protein interactions is predictive of functionally relevant sites and interactions. We also describe a potential way to identify PTM sites that may control protein conformations. The second article is a large effort to identify conditional genetic interactions in S. cerevisiae. The main message of that work was that there is a substantial amount of genetic interactions that are condition specific. These conditional genetic data allowed us to identify novel roles for yeast genes in the cell wall integrity pathway. Besides these studies we also published the first articles from work that was started within the group. I mentioned before Omar's method to predict kinase specificity from interaction networks. In addition to this we also published a news and views article highlighting recent work from Stelzl's lab and a review on the feasibility of using rational design strategies to create novel PTM regulatory sites in proteins of interest. I was anxious with the time it was taking to get the group to this point. Three years to have research outputs coming from the group feels slow but when talking with others it is apparently not unusual.

Preprints and open science

We have two additional manuscripts that are now making their way through journals. David's project on a map of human signalling states based on conditional phosphoproteomics data and Romain's phylogenetic based analysis of fungal phosphorylation sites. I am personally very much in favour of preprint servers. Although I think I have been ahead of others in suggesting the use of preprints in biology (blog post 2006) I have been slow to actually do it. My current policy in the lab lab is to first ask the authors in the group if they want to submit and then make sure all collaborators are ok with it. Unfortunately, so far, there was no consensus among the authors. I will start to push more strongly for future manuscripts to be submitted to preprint servers. When possible, we will also experiment with making a projects's data and initial analysis available online before the preprints.

Friday, November 27, 2015

Predicting PTM specificities from MS data and interaction networks

Around four years ago I wrote this blog post where I suggested that it might be possible to combine protein interaction data with phosphosites from mass-spectrometry (MS) data to infer the specificity of protein kinases. I did a very simple pilot test and invited others to contribute to the idea. Nobody really picked up on it until Omar Wagih, a PhD student in the group, decided to test the limits of the approach. To his credit I didn't even ask him to do it, his main project was supposed to be on individual genomics. I am glad that he deviated long enough to get some interesting results that have now been published.

As I described four years ago, the main inspiration for this project was the work of Neduva and colleagues. They showed that motif enrichment applied to the interaction partners of peptide binding domains can reveal the binding specificity of the domain. One step of their method was to filter out regions of proteins that were unlikely to be target sequences before doing motif identification. For PTM enzymes or binding domains we should be able to take advantage of the MS derived PTM data to select the peptides for motif identification by just taking the peptide sequences around the PTM sites. This was exactly what Omar set out to do by focusing on human kinases as a test case.

To summarize the outcome of this project the method works with some limitations. For around a third of human kinases that could be benchmarked he got very good predictions (AUC>0.7). For some kinase families the predictions are better than others and we think it due to how specific the kinase is for the residues around the target site. It is known that kinases find their targets via multiple mechanisms (e.g. docking sites, shared interactions, co-localization, etc). This specificity prediction approach will work better for kinases that find their targets mostly by recognizing amino-acids near the phosphosite. With the help of Naoyuki Sugiyama in Yasushi Ishihama's lab we validated the specificity predictions for 4 understudied human kinases. One advantage of using this approach is that it could be very general. Omar tried it also on 14-3-3 domains, that bind phosphosites and also on a bromodomain containing protein that is known to bind acetylated peptides. Finally, we also tried to use this to compare kinase specificity between human and mouse but given the current limitation of the method I don't it is possible to use these predictions alone to find divergent cases of specificity.

The predictions for human kinase specificity can be found here and a tutorial on how to repeat these predictions is here. The motif enrichment was done using the motif-x algorithm. Given that we could not really use the web version Omar implemented the algorithm in R and a package is available here.

There are many other ways to predict specificities for PTM enzymes and binding domains. If you have many known target sites the best way is to train a predictor such as Netphorest or GPS. There is also the possibility of using the known target sites in conjunction with structural data to infer rules about specificity and the specificity determining residues. A great example of this is Predikin and more recently KINspect. Ongoing work in the group now aims to combine what Omar did with some aspects of Predikin to study the evolution of kinase specificity.

Going back to beginning of the post this idea was my second attempt at an open science project. The first attempt was a project on the evolution and function of protein phosphorylation (described here). This ended up being one of the main projects of my postdoc and now the main focus of the group. I am still curious to know if distributed open science projects will ever take off. I don't mean a big project consortia but smaller scale research where several people could easily contribute with their expertise almost as "spare cycles". Often when you are an expert in some analysis or method you could easily add a contribution with little effort. However, there was much more excitement about open science a few years ago whereas now most of the discussions have shifted to pre-prints and doing away with the traditional publishing system. Maybe we just don't have time to pay attention or to contribute to such open projects.

Saturday, December 20, 2014

State of the lab, year 2 – reaching steady state

CC BY ,Jason Paul Smith

At the end of last year I wrote up a short description of what it was like to start a group at the EMBL-EBI. I though it would be interesting to try to make it an yearly event so here is the second installment. It is always scary how fast a year passes by and it is interesting to note how my perspective of managing a research group is changing.

During this year we said our first goodbyes as Vicky Kostiou (linkedin) finished her internship. We also welcomed several new members including Rahuman Sheriff (postodoc, linkedin) Haruna Imamura (postdoc, pubmed), Marta Strumillo (intern, linkedin) and Juan A Cordero Varela (master student, linkedin). Sheriff is working on a collaboration with Silvia Santos' group at the MRC-CSC in London to study cell-cycle regulation. Haruna came initially on a 1 postdoc fellowship in collaboration with Yasushi Ishihama's lab (Kyoto University, Japan) and she has recently been awarded an EIPOD postdoc fellowship to study post-translational regulation of Salmonella in collaboration with Nassos Typas and Jeroen Krijgsveld at the EMBL-Heidelberg. Marta is studying the functional role of PTMs in the context of protein structural information and Juan is participating in a project lead by Marco Galardini (postdoc, @mgalactus, webpage) to model and predict bacterial phenotypes from sequence. These new members join the group of people that I already mentioned last year: David Ochoa (postdoc, @d0choa, webpage), Romain Studer (postdoc, @RomainStuder, blog), Brandon Invergo (postdoc, webpage) and Omar Wagih (PhD student, @omarwagih).

Shaking off that postdoc feeling

In the first year my concerns were dominated by the stress of facing an empty room that I needed to fill. It was a mistake to take 6 months to find the first person since I felt like I was wasting time. This year I had to come to terms with the fact that I no longer have time to do my own research projects. After over 10 years of measuring my own productivity by the progress in my research projects it is strange to try to let it go. I am certainly doing work that I enjoy. The progress in the group has been fantastic this year but it took me time to accept that the management activities I am doing is something I should count internally as productive work.

Reaching steady-state

Any new group, specially one that starts in a place like EMBL with very generous core funding, will grow to occupy a space in research. Any movement from this position will then only happen with a slower turnover of projects and people. That seems to be one of the trade-offs from managing a research as group versus an individual. Changing directions for a whole group has to be slower than for an individual. However, as as group it is still possible to explore opportunities while maintaining a common theme of research underway. This year I think we have reached this steady-state. Although we got significant new funding starting next year I don't expect the group to grow much larger. I am curios to see how the research theme of the group will change with time.

The bad and the good of 2014

So I will start off by summarizing some of the aspects I wish had been different this year. Above all I had hoped to publish the first article(s) from the group in 2014. I am happy with the progress of the projects so far (see below) but I am still amazed on how long it takes to get a group up-and-running. Most of the group joined towards the end of last year so it has not been that much time objectively. The second aspect I think we could have done better was to communicate more online on what we have been up to. This has been one of the years with fewest blog posts since I started blogging about 11 years ago. We should do better than this, both because we are publicly funded and because the people (and projects) in the group deserve better exposure. So I will try to change this next year.

On a more positive note this has been great a great scientific year for me and the group even if not very visible to the outside. The two last papers that started still at UCSF are finally under revision and should come out next year. One is about studying the function and evolution of X. laevis phosphosites (biorxiv) and the second about conditional genetic interactions in S. cerevisiae. We also have 3 projects that are getting close to being finished from Omar, David and Romain that I hope we will submit early next year. If possible we will put them up on biorxiv as well before submission. It is obviously a great privilege to see this work take shape and I hope some of you will also be excited about it when we make it public.

Regarding funding, I had mentioned already that Haruna got an EIPOD fellowship. In addition we got a 5 year ERC starting grant awarded. I am very excited about the starting grant since this will allow us to start doing yeast genetics work to complement the proteomics and genome analysis we have been doing. This will feed in and complement almost every project in the group so I really have to thank the committee for this opportunity. For this purpose we will be hiring 2 positions (postdoc and/or technician) early next year. Since the EBI does not have lab space, the work will be done at the Genome Biology unit in Heildeberg. This means I will be traveling (even) more to Heidelberg next year. Those hired to these positions will have the opportunity.to interact with the Typas lab that conduct similar genetics studies in bacterial species. If you know anyone looking for jobs with PhD and/or postdoc experience in yeast genetics please do let them know about these positions.

Wednesday, December 18, 2013

State of the lab, year 1 – setting up

I have used this blog in the past to keep track of my academic life where I can give a less formal perspective on papers I have published or ideas I am working on. Starting a group has made me think a bit about what I blog about. I have more responsibilities towards the people that have decided to work with me, towards the institution that has hired me (EMBL-EBI) and funding sources that support our work. At least for now I have decided to keep on sharing my personal view and in that context I though it could be interesting to write down my path as group leader in academia. This might become a yearly “thing”.

I started at the EMBL-EBI January 7 and in a blink of an eye one year has gone by. I have just arrived in Portugal for a conference and holidays and I said goodbye to four people that very courageously decided to work with a unknown newbie group leader. I could sum-up what happened in this first year by saying that the group-leader tittle now makes sense – I am coordinating an actual group. Most of this year was spent applying for funding, recruiting and trying to know more about the different groups working on campus.

From an empty room to a research group

EMBL-EBI is really a great place to start a group. For those that don't know the EMBL system, group leaders are given very generous core funding to work for 5 years, plus an additional 4 years after a review process. The chances of failing the review are small but there is essentially no tenure. Core funding and additional “internal” postdoc fellowships are sufficient to run a small group without external grants. We are encouraged to apply for funding but money is not the most immediate source of stress. So for me, since I started recruiting only after arriving in January, facing that empty room where a group should be working was the first thing on my mind. Recruiting postdocs for a unknown and empty group is particularly challenging. I tried to do some of the obvious things like emailing related groups that could have people about to finish the PhD and promoting the vacancies at conferences. It is hard to quantify but I do have the impression that my online presence has been an advantage in this. Once the first couple of people started and group meetings made sense the empty room stress went away. I know people starting experimental labs right now and I have to say that computational people have it way to easy. We can buy a few computers and the “lab” is set up.

I spent a considerable amount of time applying for funding which is always somewhat frustrating. I don't mind writing grants but I am happier doing actual research. Around 6 months into the job I managed to re-start doing research and I have managed to keep working on fairly constant basis. I hope I will keep having/making time for research for as long as possible.

Meet the gang

This year we got an HFSP CDA and an ESPOD fellowship which together with the core funding allowed me to grow the group fairly quickly. The first to join was David Ochoa (postdoc, @d0choa, webpage) who will be working initially on PTM dynamics under different conditions. He also introduced me to the amazing BlackMirror series, the best fiction I have seen in a long time. Vicky Kostiou (intern) joined after and is doing a great job of improving the PTMfunc website which should be updated late January (stay tuned). The most recent arrivals were Romain Studer (postdoc, @RomainStuder, blog) and Brandon Invergo (postdoc, webpage). Romain will be using his phylogenetic and structural experience to study PTM evolution and Brandon was awarded the ESPOD fellowship to work with Jyoti Choudhary and malaria groups at Sanger on Plasmodium PTMs. Omar Wagih (@omar wagih) will be the fist PhD student joining in January. Finally, although we have still not signed a contract Marco Galardini (@mgalactus, webpage) will likely join in February to work on a collaborative project with Nassos Typas' group at the EMBL-Heidelberg.

To be, or not to be, an experimental group

One of my concerns when I joined the EMBL-EBI was that, although the Sanger is just next door, EBI is a purely computational institute. Doing computational work is pretty amazing but progress can often be limited by lack of data. High-throughput research is removing somewhat this limitation since there are probably more observations made than we can all analyze. Still, if you are really interested in going in a specific direction then a experimental group simply has more power to make the right observations. My solution for this problem, for now, will be to co-supervise people with experimental groups including Brandon's EIPOD project, Marco's project with Nassos Typas and a future hire with Silvia Santos' lab in London. This is an experiment in itself and I guess in 2 to 3 years I be able to evaluate how practical this is. One alternative is to make use of research services such as the ones listed in Science Exchange. I have discussed with a couple of companies what would be the prices for some of the work I am interested in doing. These are fairly expensive but might be a good complement to the collaborations.

Summary

So overall, the group is off to a good start. It is funded for a few years at a reasonable level and we have collaborations with other groups that share some common interests. There were some things I wish could have gone better. I didn't get all the funding I applied to, which is expected. I also didn't manage to submit the two last manuscripts that still contain work from my postdoc. It would have been great to start the second year with that off my back. Still, I am happy with how things look for the next few years. It is a privilege to be able to coordinate this group of people and level of resources around topics that I find so interesting.

Tuesday, November 06, 2012

Scholarly metrics with a heart

I attended last week the PLOS workshop on Article Level Metrics (ALM). As a disclaimer, I am part of the PLOS ALM advisory Technical Working Group (not sure why :). Alternative article level metrics refer to any set of indicators that might be used to judge the value of a scientific work (or researcher or institution, etc). As a simple example, an article that is read more than average might correlate with scientific interest or popularity of the work. There are many interesting questions around ALMs, starting even with simplest - do we need any metrics ? The only clear observation is that more of the scientific process is captured online and measured so we should at least explore the uses of this information.

Do we need metrics ? What are ALMs good for

As any researcher I dislike the fact that I am often evaluated by the impact factor (IF) of the journals I publish in. When a position has hundreds of applicants it is not practical to read each candidate's research and carefully evaluate them. As a shortcut, the evaluators (wrongly) estimate the quality of a researcher's work by the IFs of the journals. I wont discuss the merit of this practice since even Nature journal has spoken out against the value of IFs. So one of the driving forces behind the development of ALMs is this frustration with the current metrics of evaluation. If we cannot have a careful peer evaluation of our work then the hope is that we can at least have better metrics that reflect the value/interest/quality of our work. This is really an open research question and as part of the ALMs meeting, PLOS announced a PLOS ONE collection of research articles on ALMs. The collection includes a very useful introduction to ALMs by Jason Priem, Paul Groth and Dario Taraborelli.

Beyond the need for evaluation metrics ALMs should also be more broadly useful to develop filtering tools. A few years ago I noticed that articles that were being bookmarked or mentioned in blog posts had an above average number of citations. This has now being studied in much detail. Even if you are not persuaded by the value of quantitative metrics (number of mentions, PDF downloads, etc) you might be interested instead in referrals from trust-wordy sources. ALM metrics might be useful by tracking the identity of those reading, downloading, bookmarking an article. There are several researchers I follow on social media sites because they mention articles that I consistently find interesting. In relation to identity, I also learned in the meeting that ORCID author ID initiative has finally a (somewhat buggy) website that you can use to claim an ID. Also, ALMs might be useful for filtering if they can be used, along with natural language processing methods, to improve automatic classification of an articles' topic. This last point, on the importance of categorization, was brought up in the meeting by Jevin West who had some very interesting ideas on the topic (e.g. clustering, automatic semantic labeling, tracking ideas over time). If the trend for the growth of mega-journals (PLOS ONE, Scientific Reports, etc) continues, we will need these filtering tools to find the content that matters to us.

Where are we now with ALMs ?

In order to work with different metrics of impact we need to be able to measure them and these need to made available. From the publishers side PLOS has lead the way in making several metrics available through an API and there is some hope that other publishers will follow PLOS. Nature for example has recently made public a few of the same metrics for 20 of their journals although, as far as I know, they cannot be automatically queried. The availability of this information has allowed for research on the topic (see PLOS ONE collection) and even the creation of several companies/non-profit that develop ALM products (Altmetrics, ImpactStory, Plum Analytics, among others). Other established players have also been in the news recently. For example, the reference management tool Mendeley has recently announced that they have reached 2 million users whose actions can be tracked via their API and Springer announced the acquisition of Mekentosj, the company behind the reference manager Papers. The interest surrounding ALMs is clearly on the rise as publishers, companies and funders try their best to gauge the usefulness of these metrics and position themselves to have an advantage in using them.

The main topics at the PLOS meeting

It was in this context that we got together in San Francisco last week. I enjoyed the meeting format with a mix of loose topics but strict requirements for deliverables. It was worth attending even just for that and the people I met. After some introductions we got together in groups and quickly jotted down in post-its the sort of questions/problems we though were worth discussing. The post-its were clustered on the walls by commonality and a set of broad problem sets were defined (see the list here).

Problems for discussion included:

how do we increase awareness for ALMs ?
how to prevent the gaming (i.e. cheating to increase the metrics of my papers) ?
what can be and is worth measuring ?
how to exchange metrics across providers/users (standards) ?
how to give context/meaning/story to the metrics ?

We were then divided into parallel sessions where we further distilled these problems into more specific action lists and very concrete steps that can be taken right now.

Metrics with a heart

From my own subjective view of the meeting it felt like we spent a considerable amount of time discussing how to give more meaning to the metrics. I think it was Ian Mulvany who wrote in the board in one of the sessions: "What does 14 mean ?". The idea of context came up several times and from different view points. We have some understanding of what a citation means and from our own experience we can make some sense of what 10 or 100 citations mean (for different fields etc). We lack a similar sense for any other metric. As far as I know, ImpactStory is the only one trying to give context to the metrics shown by comparing the metrics of your papers with random sets of the same year. Much more can be done along these same lines. We arrived at a similar discussion from the point of view of how we present ourselves as researchers to the rest of the world. Ethan Perlstein talked about how engaging his audience through social media and giving feedback on how his papers were being read and mentioned by others was enough to tell a story that increased interest for his work. The context and story (e.g. who is talking about my work) is more important than the number of views. We reached again to the same sort of discussions when we talked about tracking and using the semantic meaning or identity/source of the metrics. For most use cases of ALMs we can think of we would benefit or downright need more context and this is likely to drive the next developments and research in this area.

The devil we don't know

Heather Piwowar asked me at some point if I had any reservations about ALMs. In particular from the point of view of evaluation (and to a lesser extent filtering) it might turn out that we are substituting a poor evaluation metric (journal impact factor) by an equally poor evaluation criteria - our capacity to project influence online. In this context it is interesting to follow some experiments that are being done in scientific crowdfunding. Ethan Perlstein has one running right now with a very catchy tittle: "Crowdfund my meth lab, yo". Success in crowdfunding should depend mostly on the capacity to project your influence or "brand" online. An exercise in personal marketing. Crowdfunding is an extreme scenario where researchers are trying to side-step the grant system and get funding directly from the wider public. However, I fear that evaluation by ALMs will tend to reward exactly the sort of skills that relate to online branding. Not to say that personal marketing is not important already, this is why researchers network in conferences and get to know editors, but ALMs might reward personal (online) branding to an even higher level.

Thursday, July 19, 2012

Evolution and Function of Post-translational Modifications

A significant portion of my postodoctoral work is finally out in the last issue of Cell (link to paper). In this study we have tried to assign a function to post-translational modifications (PTMs) that are derived from mass-spectrometry (MS). This follows directly from previous work where we looked at the evolution of phosphorylation in three fungal species (paper, blog post). We (and other groups) have seen that phosphorylation sites diverge rapidly but we don't really know if this divergence of phosphosites results in meaningful functional consequences. In order to address this we need to know the function of post-translational modifications (if they have any). Since these MS studies now routinely report several thousand PTMs per analysis we have a severe bottleneck in the functional analysis of PTMs. These issues are the motivations for this last work. We collected previously published PTMs (close to 200.000) and obtained some novel ubiquitylation sites for S. cerevisiae (in collaboration with Judit Villen's lab). We revisited the evolutionary analysis and we set up a couple of methods to prioritize those modifications that we think are more likely to be functionally important.

As an example, we have tried to assign function to PTMs by annotation those that likely occur at interface residues. One approach that turned out to be useful was to look for conservation of the modification sites within PFAM domain families. For example, in the figure above and under "Regulation of domain activity", I am depicting a kinase domain. Over 50% of the phosphorylation sites that we find in the kinase domain family occur in the well known activation loop (arrow), suggestion that this is an important regulatory region. We already know that the activation loop is an important regulatory region but we think that this conservation approach will be useful to study the regulation of many other domains. In the article we give other examples and an experimental validation using the HSP70 domain family (in collaboration with the Frydman lab).

I won't describe in detail the work as you can (hopefully) read the paper. Leave a comment or send me an email if you can't and/or if you have any questions regarding the paper or analysis. I also put up the predictions in a database (PTMfunc) for those who want to look at specific proteins. It is still very alpha, I apologize for the bugs and I will try to improve it as quickly as possible. If you want access to the underlying data just ask and I'll send the files. I am also very keen on collaborations with anyone collecting MS data or interested in the post-translational regulation of specific proteins, complexes or domain families.

Blogging and open science
Having a blog means I can give you also some of the thoughts that don't fit in a paper or press release. You can stop reading if you came for the sciency bits. One of the cool things I realized was that I have discussed in this blog three papers in the same research line, that run through my PhD and postdoc. It is fun to be able to go back not just to the papers but to the way I was thinking about these ideas at the time. Unfortunately, although I try to use this blog to promote open science this project was yet-another-failed open science project. Failed in the sense that it started with a blog post and a lot of ambition but never gained any momentum as an online collaboration. Eventually I stopped trying to push it online and as experimental collaborators joined the project I gave up on the open science side of it. I guess I will keep trying whenever if makes sense. This post closes project 1 (P1) but if you are interested in online collaborations have a look at project 2 (P2).

Publishable units and postdoc blues
This work took most of my attention during the past two years and it is probably the longest project I have worked on. Two years is not particularly long but it has certainly made me think about what is an acceptable publishable unit. As I described in the last blog post, this concept is very hard to define. While we probably all agree that a factoid in a tweet is not something I should put on my CV we allow and even cheer for publishing outlets that accept very incremental papers. The work I described above could have easily been sliced into smaller chunks but would it have the same value ? We would have put out the main ideas much faster but it could have been impossible to convince someone to test them. I feel that the combination of the different analysis and experiments has more value as a single story but an incremental approach would have been more transparent. Maybe the ideal situation would be to have the increments online in blogs, wikis and repositories and collect them in stories for publication. Maybe, just maybe, these thoughts are the consequence of postdoc blues. As I was trying to finish and publish this project I was also jumping through the academic track hoops but I will leave that for a separate post.