Tuesday, November 18, 2025

AI "peer" review - the impact on scientific publishing

It is the first time, in the second half of this year, that I am not trying to urgently deal with something. So, instead of working on some manuscripts from the lab (sorry!), I took some time to look in more detail at the outputs of two recently announced science AI "assistants" dedicated to scientific publishing. The q.e.d. science peer review system and the Nature Research Assistant tool. This was not a very rigorous or quantitative assessment but instead I had a look at the tool's outputs based on 3 manuscripts from our lab - 2 recent preprints and 1 manuscript that we are still working on. 

If you haven't tried it yet, q.e.d. tries to identify and list what are the claims made in a scientific paper and then identify any major or minor gaps in these claims. Visually, it is presented as hierarchical tree with a main message for the whole manuscript, main claims and related (sub) claims. It is refreshing and positive to me that they decided to present this in a way that is different from the standard text peer review format but, in essence, this is very much the type of information obtained in a peer review report. In addition, the section "What's new" also provides a description of what the model believes is the most novel about the work and what might have been done in some way by other studies.

Before getting into more details about the output of q.e.d., I also tried the same 3 manuscripts in the Nature Research Assistant tool. This is clearly more conservative in scope and it provides a series of suggestions, primarily focused on improving the text. The tool does provide a list of identified "overstated claims" which comes closer to the idea of finding gaps in scientific claims/statements as done in q.e.d. science.

How good is the output of these AI assistants

Regarding the output of these tools, I am really impressed by the level of detail of q.e.d. For every gap, it has a written explanation of the identified issues and suggestions for additional work or text changes to mitigate the issue. Many of the identified gaps require quite detailed technical knowledge. In one particular example, the tool found a very non-trivial gap in the null model of a statistical test that required knowledge of proteomics, evolution and bioinformatics. The 3 manuscripts are very computational, which the authors indicate is not an area they have focused during development. One of the manuscripts was flagged as being from a domain knowledge that does not fit their current set of domain areas. Still, I could expect to see many of these comments in a human peer review report. Is there something in these gaps that we never considered before, and that I need to absolutely act on? Not really, but that can be said honestly of a significant fraction of all peer-review comments. I would generally rank these AI generated comments as about average. Not among the most useful peer-review comments but certainly better than many we have received over the years.  

The output of Nature's research assistant is much more what you would expect of a tool dedicated to improving the text of a manuscript. I think it is most useful to find parts of the text that could benefit from improved clarity. In the way the information is presented it also promotes the author's revising the sections, deciding to use or not the suggestions from the tool, instead of simply feeding the whole text through an LLM. It is more of an assistant than a replacement for writing. I don't think I would give money to a tool like this over say a general LLM chat bot.   

For comparison purposes, I tried to recreate the output of q.e.d. using a standard LLM chatbot (Gemini Pro in this case). I took one of the manuscripts and tried to formulate the prompt in the way to get also a list of claims, gaps and suggested changes. The output was not as good as q.e.d. but some of the gaps were the same although it seemed qualitatively a bit more superficial. 

AI "peer" review is here to stay

Whether we want it or not, these tools are now reaching a point where they can be used to identify gaps in a scientific manuscript that could pass as a human (peer) review report. There are many ways these tools can be used and abused. The most positive outcome of this might be that authors take advantage of these as assistants to help improve the clarity of the manuscripts before making them public. The most obvious negative outcome is that these will be used as lazy human reviewing just copy-pasted to satisfy the ever growing need to peer-review our ever growing production of scientific papers every year.  Given that these reports can be generated quickly, potentially as part of the submission process, it could well be that a good way to preempt the use of these by human peer-reviewers might be that the journal already provides them to the peer-reviewers as part of the request for assessment. This would already make clear that the editor/journal is aware of the things that an automated report would bring up and avoid having the reviewers simply trying to fake a report. Finally, there is also a likely scenario that editors of scientific journals start to integrate these reports as part of their initial editorial decisions. In particular, from the editorial perspective, these tools might end up serving as biased and lazy assessments of novelty and impact.  

As a peer-reviewer, I don't think these automated reports would reduce the level of work I need to do. I still would need to spend the time to read through a paper, consider the methods used and try to figure out if there are issues that the authors might have missed and if the claims and interpretation make sense relative to the data. Having such automated reports might be a useful addition as well as having a list of related published papers. 

Perhaps one aspect that is not strongly emphasized in q.e.d. but more obvious in Nature's research assistant and even other tools is the connection of a given manuscript with the broader scientific literature. As scientists, I think it is fair to admit that it is hard to be fully aware of all of the work that has been published in a field. Sometimes the connections between our work and existing literature are less obvious because they can happen through analogy and/or shared methods. Surfacing such connections in the process of writing up a manuscript in an easy way would be particularly useful. 

Science and scientific publishing in the age of AI slop

We were already drowning in scientific papers before ChatGPT and co.  Now there is growing evidence of papers being produced by AI and quite a lot of buzz around the concept of fully automated AI scientists. So it is unfortunately unavoidable that this is going to translate to an even stronger increase in the number of publications and added pressures to the scientific publishing system. One optimistic take on this is that the added publications will be easy to ignore crap that won't affect our productivity but it is at least likely to result in more added wasted money being spent feeding the already rich publishing industry. Unfortunately, I think this will also hurt attempts to move away from our expensive and inefficient traditional publishing system when scientists worry more about "high-quality" science. The current (bad) proxies for quality (i.e. high impact factor journals) can't be easily changed to something else in an environment where many scientists will rightfully be even more worried about scientific rigor. 

  

  

Wednesday, July 23, 2025

Why do we still publish in scientific journals ?

We publish in scientific journals to disclose our discoveries, such that others can build upon them. But we now have preprint servers and we can quickly make our discoveries available to others. So maybe we publish in scientific journals because we value the peer review that is organized by them. However, we also have now journal independent peer review systems, like Review Commons, which allow us to perform peer review on top of preprints, in a way that does not require subsequent submission to a scientific journal. So why do we still publish in scientific journals ? 

Once in a while someone online complains about the cost of open access publication fees, the so called article processing charge (APC).  Looking at this simplistically, it does seem ridiculous that a journal might ask the authors $5-10k USD to publish a paper when all the work is apparently done by scientists that write and review the articles. Of course, this APC cost is a lot more complicated than this and there is an historical context and background knowledge that is needed to discuss these. In reality, a lot of the cost goes into sustaining the editorial salaries of journals with high rejection rates. I covered this in detail in a previous blog post discussing the costs from EMBO Press. In addition to the editorial salary costs for journals with high rejection rates, we also don't have a free market since we don't pick journals based on price and service quality but on how publishing in certain journals will be perceived by others.  

So, for many reasons, the major costs of scientific publishing are not the act of peer review and making knowledge public. If I had to guess, the actual costs publishing a peer-review article with near 0% rejection rate would be below $500USD per paper if done in high volume. The main costs of publishing are primarily the costs linked to the system of filtering scientific publications into tiers of perceived "impact". It was, for a long time, nearly impossible to evolve scientific publishing and I have argued for almost 20 years that we needed to split the publishing process into modular bits that would allow for much more innovation. With the rise of preprints, social media and dedicated peer-review services, I think we now could work towards getting rid of scientific journals. Or at least, we now have a clear direction of focus on what is missing in this potential alternative system - a new reward infrastructure.

The reward infrastructure in science

So why do we still publish in scientific journals ?  The reality is that people still want to chase high impact journals. Pretending that we don't is not going to change anything. Despite having tenure and secure funding for my group, I feel that I cannot stop trying to publish in some journals because of what it means for the career of my lab members; for how my peers perceive and evaluate our work; for establishing new collaborations and applying for additional funding. So how are we going to change this and what could the consequences be ?

Unfortunately, there is no incentive for any single individual to change the reward system. At least as of now, this would require a large number of labs within a sub-field to jointly commit to a change in practice, perhaps assisted by some external entity.  We could assume that social media, conferences and recommendation engines (Google Scholar) are enough to spread knowledge and that within a specific sub-field it is possible to evaluate each other without the need for journal proxies. I am not sure this is really true but if we accepted this, then a number of labs in a field could commit to no longer publishing in scientific journals. This could be assisted by, at the same time, creating an overlay journal of their field where academic editors would select a subset of peer-reviewed preprints that represent some particularly strong advance in the field. 

Unfortunately, this idea is unlikely to work because it relies on collective action by a majority of groups within a field. I don't have better ideas but this is for me the last barrier remaining. We still need to work out how we would pay for the peer-review service but ideas that would help change the reward system in a way that do not require collective action are now what is needed. 

What could go wrong if it happened

Despite all that we complain about in our current system of tiered journals, they do aim to improve science. They might not work as intended but they aim to filter science by accuracy and perceived value to others. If we managed to get rid of these things, we could have an even worse problem with the sheer number and quality of scientific outputs. As an almost anecdotal evidence, our group has become at lot worst at working through the revisions of our papers in a timely fashion. If our manuscripts were not out as preprints I think we would be much more in a hurry to do the revisions. 

The other important caveat around this is that time and attention is always limiting. There will always be a need to filter and evaluate science by proxies. If we didn't have science journals we might be complaining about how attention in social media is being used a bad proxy for the value of research.

I am truly curious to know how scientific interactions would change without scientific journals. Would people still want to apply to our group, want to collaborate on projects, invite us to conferences if our outputs were essentially peer-reviewed preprints? For my lab members that might read this - don't worry, this is not a declaration of intention. 

Sunday, March 16, 2025

State of the lab 12 - Becoming an established scientist

This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes year 12, the 3rd year after moving to ETH Zurich. In the last blog post I wrote down some of our overall research directions for the first 5 years of the group at ETH and I will wait another year or two before reflecting back on those commitments. This time, I wanted to try to write down some thoughts I have been having about essentially becoming more established in academia. This includes a longer term perception of group turnover, the time and resources needed to achieve research objectives and some activities that go beyond the management of the research group.


Group member turnover cycles

With 12 years of managing a research group, I have gotten used to some of the broader rhythms of turnover of the lab. Our lab is now almost totally renewed with just 1 lab member that came with the lab from EMBL. While this turnover was somewhat enforced by the move from EMBL to ETH, the turnover of lab members is a constant in academia given the short term nature of the lab members’ positions. In our group PhD students have typically stayed for around 4 years and postdoc have typically stayed for up to 5 years. Since there is some degree of clustering of the hires there tends to be some periods of higher turnover. We have had something like 2 to 3 periods where the lab has seen a large change. In the group, I try to hire from diverse backgrounds (e.g. biology, CS and math) and we work with a range of experimental and computational approaches, including for example yeast genetics, proteomics, structural bioinformatics, machine learning, etc. This creates a nice dynamic of group members building up their projects, while at the same time learning about the capabilities of the rest of the lab. The projects are usually meant to be somewhat synergistic, trying to address bigger goals from the individual problems (see past blog post on this). This means we have had windows of around 3 years when things click together before the turnover starts again. We are just around that exciting stage in the cycle and I am really looking forward to making the best of it. I still don’t enjoy what comes next, when the group will inevitably turnover again. I have accepted that it is an opportunity to steer the ship into new directions but sometimes it is disappointing to change the group just around the time it feels like we can take on almost any challenge.


Longer term view of science

One thing that has been on my mind is that I am sometimes weary about the time it can take to achieve a research goal. I am not talking here about an individual research project which tends to take on the order of 2 to 3 years on average. In our group we have tried to address some bigger research goals, such as trying to understand the evolution of protein phosphorylation or the functional relevance of individual phosphosites. These kinds of challenges take multiple independent projects and over 10 years of time to make a meaningful dent on. These days I will look at a potential long term research goal and I will think about the many different types of methods and steps that will be needed and this can distract me from the excitement of figuring those things out. I should say that I am by no means jaded about doing research. I still get such a thrill discussing the day-to-day results with lab members, being at the frontier and trying to figure things out. It is just when I pause to think about the longer term view, either in the past or trying to project into the future that I sometimes wish things could just move faster. I have taken part in a couple of large multi-PI projects that have moved very quickly and from these I can see the temptation of trying to have large labs. 


From junior to “established” PI

There is no point in time when a switch happens and someone is no longer considered a junior PI but after 12 years I can safely assume that label no longer applies to me. This has brought some relatively small changes in my job, one simple one being that I no longer think about tenure. For most of my career I was on fixed term positions, including my first group leader position at EMBL which had a time limit of 9 years. I joined ETH 3 years ago on a tenured contract and not having to think about my next job has left me with a tiny post-tenure slump - what am I aiming for ? Related to the previous section, I have considered that I could enjoy overseeing science at a higher level than as a group leader. As one example, I organized an application for a National Centre of Competence in Research (NCCRs) with 19 PIs interested in human genetics in Switzerland. While the application failed, I was really keen and excited to co-direct the center if it had been funded. 


Another aspect of my job that has changed somewhat is a higher commitment to activities outside the lab, such as taking part in committees, advisory panels or formal and informal mentorship of junior PIs. I don’t feel particularly overwhelmed by these activities but that might change if I am required to take part in more committees within ETH. Not everything is an additional burden to an already busy job. I have felt that being more visible and connected in international science comes with benefits, including being easier to at least discuss collaborations or having labs interested in joint grant applications.


Scientists that have worked in academia for longer than I have might find some of these things funny and I am certainly curious about what it will feel like reading this 10 years and more from now. In fact, the blog is now a bit over 20 years old with posts starting in my PhD. While I don’t post much these days I aim to continue at least this yearly series while I feel there are some new things to say beyond the progress in our science.

 

Monday, November 13, 2023

State of the lab 10 and 11 - the first years at ETH Zurich

a lake by a mountain
Yet another lake by a mountain in Switzerland
This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes years 10 and 11, the first 2 years after moving to ETH Zurich. It also marks the end of the first decade as a research group leader, which is meaningful only because we have ten fingers and use 10 as a base for counting but I digress. There has been a lot to adapt to in moving to a new country including all the basics of moving, re-building the group and starting teaching. It was a lot easier than the first time around since I didn't have to set up the group from zero. Some people came with me, some stayed at EMBL-EBI with funding that couldn't be moved and generally speaking we could continue several computational related projects without much interruption. If we were primarily lab based then I think the interruption would have been more dramatic. Unexpectedly, there were more periods of high stress than I typically have. There was no particular reason for the stress but just a combination of multiple small things and probably due mostly to the adaptation to a new place. I will cover here some of the biggest things I am having to adapt to and also some of the research directions planned for the first 5 years of the group at ETH. One aspect that I will not cover is networking and getting to know the Swiss research landscape, but I will come to it in a later post.

The Swiss style of leadership

The EMBL, where I was before, has a very top-down leadership. EMBL is funded by different counties that are represented in the EMBL council. There is a director general who is appointed by the council and has a lot of control. Of course, there is a hierarchical support structure with a senior management team, heads of research units and a group of "senior scientists" that support the director in decision making. I am still figuring out ETH but there is a very different feel to it, both in size and style of leadership. EMBL employs around 2000 people while ETH has around 12,000. Organizationally, ETH is divided into 16 departments, and each department is further split into different institutes. For example, I am in the Department of Biology, which has 6 institutes, and I am in the Institute of Molecular Systems Biology (IMSB). As leadership, there is an executive board, including the president of ETH, then the Department heads, and in each department there is the meeting of heads of institute and the professorial conferences (i.e. all votes from professors). At least in the Department of Biology the heads of the institutes and the leadership of the Department are meant to rotate every 2 years. At these levels - institute and department - the leadership feels highly representative with lots and lots (!) of voting. This representative rotational leadership feels very different from EMBL and I think mirrors more broadly a Swiss way of doing things. The obvious consequence of this is that any change requires deep consensus and therefore radical change is less likely but it is too early to say much more.  

Teaching at undergraduate level

During 9 years at EMBL I had almost zero teaching duties. I voluntarily taught some classes in the GABBA PhD program in Portugal and not much more. At ETH teaching is now an important part of my job. I am teaching courses in Bioinformatics and Systems Biology, primarily to biology students, which are all very familiar topics and close to my area of research. I don't particularly enjoy the act of teaching, in particular standing in front of 70-100 students and trying to explain things. As an introvert I am more comfortable with 1-on-1 or small group discussions and I get very tired with the interaction of teaching in a classroom setting. I have always said that Biology students should learn more computational skills so at least I have the opportunity now to influence that at ETH. In fact, the biology curriculum was changed right when I was joining to add more bioinformatics and they do have the chance to learn it with multiple lectures that cover bioinformatics and machine learning. Despite it being a mixed bag for me I am privileged in that I have a very low teaching load in topics that I like. Teaching is an area that I feel I could do more for and it could have an impact, in particular if we made it open to anyone. However, it is still something that I find difficult to fully devote to given the research role. 

Our research at ETH during the first 5 years

The start of the research group at ETH has been fantastic. There was another big turnover of the group members during the transition, the second major turnover since the group started 11 years ago. I am really happy with the team we have here and having done this sort of turnover before, I can already see the growing potential of many projects that have started here. So the next 2-3 years is going to be about building up these projects and trying to coordinate them such that they interact and feed off each other. We have very generous stable funding as all other tenured prof positions at ETH  - so called endowed professorships in the US or positions with core funding for the European researchers.  Surprisingly, there is not a lot of oversight on this research funding which is a big difference from EMBL where the units, and their group leaders, are reviewed every 4 years. So I thought I could at least write down our commitment for research over the first 5 years here, in the spirit of disclosing what we are doing with this public research funding.

Human genetics research - mechanisms linking genotype to phenotype

Human genetics is an area that we started working on in the last 3-4 years or so of EMBL. Some of these things are already visible in recently published articles, including some protein-interaction network-based analyses of trait-associated genes. We continue to actively work on this and one direction of focus is to try to build interaction networks that are specific to different tissues or cell types.  We are working on a manuscript on this and it is an area to continue to build upon, to be able to study the differences in cell biology of different cells/tissues and how genetic changes manifest differently in these. A second direction of focus here is to study the relation between common and rare variants linked to related traits using networks.

From cells to proteins - we are finishing a project where we are using protein structures to annotate functional residues in proteins to study mechanisms of pathogenicity.  One aspect of this that will need further development is expanding on the prediction of structural modelling of protein interactions with other proteins and other molecules. Finally, we are interested in how genetic variation controls protein levels and ideally how to build computational models that can integrate the impact of genetic variation through control of protein levels, interactions, organs and organismal traits, ideally without a black-box modelling approach. All of these things are actively ongoing and I expect to have progress to report in the coming years. 

Post-translational regulation - large scale studies of kinase signalling

There are over 100,000 phophosphosites discovered in human proteins and over 20,000 found in budding yeast proteins. We don't have good methods to study the functional role of these phosphosites nor to reconstruct the kinase/phosphatase-substrate signalling network of different cells.  About half of the group is continuing to work on these problems and here at ETH we managed to consolidate the computational and experimental parts of our group which used to run in different locations while I was at EMBL. Because we are doing more of the experimental work now, this part of the group had a slower start but things are now moving along very well. Some of the problems that we are working on include the prediction of the biological process regulated by phosphosites; studying the impact of phosphorylation on protein conformational change; experimental methods to map kinase-substrate interactions and large scale mutational studies of PTMs. The thought has crossed my mind to phased-down a bit this area of research, or at least to move more into mammalian systems in our experimental work to make it more complementary to the human genetics side of the lab. 

Structural bioinformatics, protein evolution and other

We have been having a lot of fun with AlphaFold2 ! With the current fast pace of change in protein related bioinformatics methods I am sure we will continue to play with these methods as they come. It is not likely that we will do a lot of method development ourselves, it is not our way, but I think we are very good partners for method developers to help make the bridge to applications. Protein structures, protein design and evolution models are all things we will likely be playing around with in the coming years. 

Wednesday, November 16, 2022

20 years of open science or how we haven't radically changed the way we do science online

Around 20 years ago I was a starting PhD student and it was an exciting time for the internet. It was the time of blogs, wikis and a large increase in public participation with more user generated content in what is commonly known as the start of Web 2.0.  These were the times of web based online communities such as the now defunct Kuro5hin or the great survivor slashdot.org. I started this blog 19 years ago and I was also "hanging out" in an online community called Nodalpoint. Nodalpoint no longer exists but it was a discussion forum/wiki for bioinformatics with some of these discussions still preserved thanks to the magic of the way back machine. 

Around the time of 2002-2006 all of the excitement around Web 2.0 was also infecting academia with many discussions around open science. I know that open science is a vague term that can mean many different things including open access, citizen science, open source and many others. One specific aspect that I want to focus on is the idea of organizing research in a way that is not based on local group structures. In 2005 I wrote a Nodalpoint post on "Virtual collaborative research" which is similar in spirit to open source software development but with a focus on discovery not tool development. Part of this would mean surfacing more of our ongoing research and taking part in research projects that are not organized by traditional research group structures. The idea of being extremely open about ongoing research activities was advocated by others under the term of "open notebook science".

Over the following years I made a few attempts at starting such open research projects with blog posts where I tried to set up tools and ideas where others could take part in (see posts from 2007, 2008 and 2010).  The last project idea I tried to propose in such way ended up being one of the major projects from my postdoc and basically one of research lines I am still working on. In the end, none of these attempts really took off as open collaborative research projects. In hindsight, I am not surprised it didn't work. Even within local structures of research institutes and university departments there is so much discussion on incentives for local collaborations. While I think the traditional structures for organizing research do work, as a PhD student and postdoc I was very frustrated by the apparent difficulty of making the most of everyone's expertise. As a group leader I have more capacity to establish collaborations but I still think we aren't using the internet to its full capacity. 

So what happened in the decade from 2010 to 2020 ? Blogs and online communities mostly died out and Web2.0 was swallowed by corporations. One major change was the rise of large social networks and the standardization of the stream as way for people to share information and interact. Academia started participating in social networks around the time of Friendfeed (2007-2015) and such participation become mainstream with the popularization of Twitter. I honestly would never have predicted the rise of academic twitter and it is truly a sign of how the geeks have inherited the earth. 

The reason I am even thinking about open science these days is that over the past couple of years we have been involved in projects that have illustrated this potential of large collaborations empowered by the internet. I wanted to write this down also to have something to come back to in the future. The first project was a study of phosphorylation changes during SARS-CoV-2 infection. Like many others, when the pandemic sent our research group home, I though about what we could do to help and sent emails to a few people that could be working on the topic. Nevan Krogan, my former postdoc supervisor, was very keen to involve us which lead to several projects including this study of protein phosphorylation. This was probably one of the most exciting projects I have been involved with and included a very spontaneous collaboration among a large international team coordinated by a few people through slack. In this case the network of interactions was provided by Nevan and it was possible because everyone was pushing in the same direction triggered by a catastrophe. I wish everyone could feel the sense of power that I think we felt during this project. There was so much scientific capacity at the disposal of this single project and we could iterate through experiments and data analysis at an incredible pace. It is even hard to express how it felt to be able to just get things done when you had the world experts for what was required to do at every step. 

A second even more interesting example was a community effort to study the value of AlphaFold2 in a series of applications. When AlphaFold2 was released, several scientists started sharing their early observations of how AlphaFold2 and predicted structures could be used for different applications. I though all of these examples were really exciting and that we could structure this output into a manuscript. So I just contacted people that were doing this and also asked on social media if anyone else wanted to participate. In the end every contribution to this was quite modular and it was easy to integrate this into a manuscript with a few meetings and a google doc to put things together. Perhaps the less usual thing that happened was receiving actual results through Twitter chat. 

I think both of these examples required a trigger - the pandemic and the release of AlphaFold2 - that led to many scientists moving in the same direction.  In both of these cases I think we achieved in a few months what would take a single group potentially one to several years to do. Yet, these interactions remain difficult to make. Perhaps simply because we are just too busy with our own research questions or more likely because of the importance of credit and evaluation systems in academia.  These days I am actually less in favor of radical sharing of ongoing research, in the spirit of open notebook science.  I don't think we have the attention span for it. It would be too difficult to navigate and may lead to more "group think" instead of divergent thinking and ideas. Maybe the simple existence of social networks like twitter are already a good step forward. I certainly get to know more people and what they may be up to via this. Lets see what the next 20 years bring. 
  





Tuesday, March 08, 2022

Independent evaluation of AlphaFold-Multimer

AlphaFold2 has been widely reported as a fantastic leap forward in the prediction of protein structures from sequence, when sequence has enough homologs to build a reasonable multiple sequence alignment.  When AlphaFold2 was released (Jumper et al. 2021) there were several independent reports of how it could also be used for the prediction of structures of protein complexes despite the fact that it was not trained to do so (Bryant et al., 2021; Ko and Lee, 2021; Mirdita et al. 2022). Together with the lab of Arne Elofsson, in work led by David Burke in our group and Patrick Bryant in Arne's group, we have shown that it can be applied in reasonably large scale to predict structures of protein complexes for known human interactions (Burke et al. 2021). There is a lot to investigate still but it is clear that this is an extremely exciting direction of research since that lead to a major advances in the structural analysis of cell biology, evolution, biotechnology, etc. 

Soon after these first reports, DeepMind released an AlphaFold version that was re-trained specifically for prediction of structures of protein complex - AlphaFold-Multimer (Evans et al. 2021). Given that they reported an even higher success rate with this specifically trained model we were quite excited to give this a try. David Burke selected a set of 650 pairs of human proteins from the Hu.MAP dataset, known to physically interact and for which the experimental structure has been solved. A structure was predicted using AF v2.1.1 (AF-multimer) using default settings and the model_1_multimer parameter set. A second model was predicted using AF using the model1 monomer parameter set and the FoldDock pipeline. For each model, DockQ scores were produced which reflect the similarity of the predicted structure with the experimental structure with a specific focus on the interaction interface residues. A DockQ score value below 0.23 can be considered essentially an incorrect or random model. 

Below we show a direct comparison between the two AlphaFold2 models with the AF2 Multimer showing a very significant improvement based on DockQ scores. Of all predictions tested, there were 51% above DockQ>0.23 with AF2 Multimer and 40%>0.23 with "standard" AlphaFold2. This improvement (+11%) is not as large as that reported by the DeepMind team (+25%) on their own test set. There could be several reasons for the difference but more importantly this would be more than enough to justify using Multimer for the prediction of protein complexes. 


However, David quickly realised that there were many examples of clashes at the predicted interface with the AF2 Multimer model. In the figure below we show just an example of this which, despite the high DockQ score (0.85) clearly has several overlapping residues. That is, while the interface region is likely to be correct, the model at the interface has serious errors. 


These clashes in predicted structures are quite frequent with 69% of predictions having some clash. The clashes can be quite extreme with several involving a very high fraction of the total length of the protein as shown in the distribution below. Such clashes are essentially not seen in the predictions made with the earlier version of AlphaFold2. 

While there may be some cases where the clashes could be minimised, as it stands the models produced by AF-multimer may not be usable for a large fraction of cases. However, these issues are of course easy to spot. DeepMind is in fact aware of this bug since around November and have said they are working on it. From the point of view of predicting the regions of the proteins where the interaction will occur AF-multimer may still be usable as it is and hopefully DeepMind will find a fix for this problem. 



Wednesday, February 02, 2022

A closer look at the costs of EMBO publishing

There has been a lot of discussions on social media about the price that some publishers are coming up for publishing a paper in their journals - the so called article processing charges (APC). With some journals asking for values that are on the order of 10k and many scientists finding these values to be outrageous. Given that journals don't work to produce the research articles and get academics to do the evaluation, how can these journals claim the costs of publishing a paper to be anywhere close to 10k ? While I agree that these are outrageous values, I don't really believe that the price is mostly profit. A good source of information for the costs associated with running a publisher are those that have been disclosed by EMBO Publishing. Before we go into these I need to disclose that I serve on the Publications Advisory Board of EMBO publishing. I don't receive anything from EMBO and this is merely an advisory committee but it has given me some insight into what is a very real attempt from non-profit publisher to come up with an APC that is low and what they could compromise on their current set-up to achieve it. 

With that out of the way lets just look at the most recent numbers that EMBO has disclosed which were for 2019 (see here). EMBO has (or had in 2019) 17 professional scientific editors and 6 support staff, that handled a total of 5,766 submissions in 2019. That is on the order of 28 submissions handled per month per editor, 1.3 per working day. I don't know about you but making a call on 1 paper per day plus finding/chasing reviewers is not easy if you try to do it properly, even if you can make some rejections fairly quickly. From these they ended up publishing 472 (8%). This part is not totally transparent, for example maybe some of the submissions included the reviews and news&views articles that were ultimately also published. If that is the case then the total number published would be 681 (12%). It is also not totally clear if the submissions include also revision submissions. Regardless, this shows that the total of EMBO publishing ends up having acceptance rates that are quite low (10-20%). I should stress that I truly don't know the actual number. As we easily see, this rejection rate is really key for the high estimated cost per paper. 

The costs that they have disclosed includes ~2,5 million euro for the EMBO Press office, of which around 2 million is listed as salaries and benefits. The number of staff is there as well so you can guestimate the average salary for the 23 staff and you can also look up EMBO editor salary on Glassdoor to get an idea. I truly don't know what the salary is but I guess on average it could be on the order of 4-6k net per month. The other costs include 1,723,639 euro that EMBO Publishing pays to Wiley which in fact does the actual publishing. The majority of this cost is listed as "Wiley publishing services (incl. production, sales and marketing)" (1,281,552 euro). This is certainly a place where costs are not very transparent, at least to me, and where profit to Wiley is included, likely with a decent margin. I certainly don't know enough about finances to figure out but Wiley is claimed to have around 30% of operating profit margin but for the purposes of some later calculations, lets assume that maybe 50% of these costs are profit that could be magically removed (e.g. EMBO sets up their own publishing infrastructure). Finally, EMBO also lists 1,342,374 euro in "surplus" which is re-invested into some publishing related actives like the EMBO Source Data project, other pilots trying to innovate on the publishing side and back to EMBO itself which further supports EMBO program activities (fellowships, etc). 

With these numbers then the total cost includes the 4,225,920 of actual cost and the 1,342,374 for EMBO activities (5,568,294 euro total). So if you don't take anything out of this, you would need a price of 11797 euro for each of the 472 paper published in 2019 to finance this. If you exclude the EMBO surplus that would be 8953 per paper and excluding 50% of Wiley costs it would get down to 7127 per paper. Even without anything from Wiley you would only get to 5301 per paper. Of course, you can also argue that the salaries costs could be lower but what can't really be argued is that academic editors can do this for "free" since that is time that most likely is even more expensive and less efficient. 

So the 10k APC number certainly contains parts that can be reduced but we are not talking about a 1k per paper cost. For that you would need to change the rejection rates and this is what really starts mattering in the end. If you go to maybe something like 50% acceptance rates which could correspond to something like 2000 papers published in this case, then the APC could be somewhere on the order of 1500-2500 euro. Keep also in mind that submission numbers would tend to decrease over time if the impact factors go down with higher acceptance rates (yes, some people still care about those). Of course, this scales across multiple journals and this is where the big publishers are just taking advantage since the overall acceptance rate across the large portfolio of journals is much higher than 10% and high acceptance rate journals (e.g. Scientific Reports) can cross-subsidise low acceptance rate journals (Nature). 

It is important again to keep in mind that all of these prices per paper have been there for decades but were paid via journals subscription charges instead of APCs and therefore they were not transparent and people were not really paying attention. In the end, the discussion for me is not really around the 30% savings we could have by pushing the publishers to lower their prices, but more about how we go about doing the filtering (i.e. target audience) and subjective evaluation of value to science (i.e. impact). Revolutions are not real solutions in academic publishing. If you propose a solution that requires a majority of people to change their habits in the span of 3 years it is dead on arrival. 


Wednesday, January 19, 2022

State of the lab 9 - an informal report on the 9 years of EMBL-EBI

This blog post is part of a yearly series (or close to yearly) on running a research lab in academia. 2021 was the last of 9 years as a group leader at EMBL-EBI, which is the standard time given to group leaders to establish and run their labs at EMBL. For this year's blog post I thought it was a good time to look back at the full 9 years and I am going to (briefly) cover the time at EMBL with some numbers including giving an approximate account of the finances. This is something that I do with the group at the start of every year but it still feels strange to make financial numbers public. 

The scientists

A lot has happened during 9 years. Starting with the people, we have had 7 PhD students, 1 of which co-supervised, 13 postdocs and 10 interns/visiting lab members. The total group size was around 10 for the majority of the time which, as a manager, feels about right in what I can do as a direct line manager. It is fair to say that science is a very social activity and working with different people with different personalities, through the good and bad, is really enriching. Not to get all corny but the personal interactions are some of the things that stick with me the most over the time. It is always in those extremes - the "unfairly" rejected paper or unexpected positive response, individual personal and work difficulties that are overcome or sometimes not. Mental well being is an example of such difficulties that across the broader society we are not good at dealing with and that have also not always been easy as a manager. 

From these 30 lab members there are 7 that will continue with the group over the next few years: Cristina (senior scientist), Jurgen (postdoc) and Miguel (postdoc) have joined me at ETH and Eirini (PhD student), David (postdoc), Inigo (postdoc) and Danish (postdoc) will remain at EMBL-EBI with funding that cannot be moved. From the PhD students and postdocs that have left all but 2 have left with published papers as first or co-first authors. One PhD student decided not to continue the PhD and one postdoc left after several years without a first author paper. In both cases I feel some blame as the project ended up being difficult and the results were just not very positive.  

The publications and science

In total we published 45 original research papers, 3 review articles and 2 news&views over the course of 9 years. This includes only research that was really done after starting the group and also includes 8 preprints that have not yet been published in a journal after peer-review. This is split into 27 papers where I am listed as co-corresponding author and I also think our group played an important role in the final outcome, plus 18 on which our group had some input into. I am showing on the figure the distribution of these papers along the 9 years. The first paper from our group only came at year 3 with the first real significant set of publications coming at year 4 and 5. In regards to the non-tenure track system, even by this crude metric it is easy to see how different it would be if I had to apply to the job market at year 6-7 vs year 8-9. Of course, note that the numbers for 2021 in particular are inflated by preprints that will ultimately be published in a journal most likely in 2022. Another clear trend that feels true to me is the increase of small collaboration efforts where our group just helped out in some modest way. I think this is a reflection of just being more integrated into the local and broader academic networks.

I am not going to go into the scientific outcomes of the 9 years in any detail. I think some of the strongest work we did was on the evolution and functional importance of protein phosphorylation with multiple publications that have built on each other and where I think our contributions move this field forward. There was also a smaller line of research on the genetics of trait variation that I wouldn't consider to be at the cutting edge but it has been fun to work on. In particular it has been interesting to step closer to the fields of human genetics and genetics of human disease where making advances requires the interactions between people with such different ways of viewing science. Just the language barriers between human genetics, cell biology, biochemistry and chemical biology have been fascinating to get into. 

The funding

So now something that feels less comfortable or at least less common to discuss - the funding. Before going into any numbers, I should caveat this by saying that these are very rough approximations that of course should not be considered an actual financial statement. These numbers also don't take into account the money spent on the whole infrastructure (administration, grants, IT, etc) but are just the funding spent on research lab members, including my salary, and consumables. With that out of the way, over 9 years we spent approximately 5.7 million euros as broken down per year on the figure. Although we have had a small wet lab running in the last 6 years, I would say that 90% of this was on salaries. Of these around 2.7 million were from external grant funding, plus ~450k from competitive internal postdoc fellowships. This of course just shows how amazing it is to work in a place with core funding. I ended up being very successful early on with 2 million funded in years 2 a 3 and this made me too careless about applying for grants later on which I now consider a real error on my part. I applied in total to 13 external grants with 6 being successful. 

So a number that immediately is easy to get but that is probably quite meaningless is the money spent per research paper. We spent a total of ~127k euros per paper or 210k if we only count those where I am listed as co-corresponding. Of course this varies so much per paper really with my very rough estimates on bounds to be something like between 25k to 1 million.  Given that we mostly spend the budget on salaries this simply reflects the amount of people time spent on a project. 

To new beginnings 

This is a somewhat dry recap of the 9 years of EMBL but I thought it would be interesting, at least to me, to have these things written down. Even if these are just numbers, I am curious to see what the next 9-10 years look like. I am sitting in my new office at ETH, just close to two weeks after arriving in Zurich. There is a lot to adapt to, including teaching material that I should be preparing right now. I am curious to see how long it will take me to get into the local academic network and how much the move will impact on our capacity to do work. The lab work is really the part that will take the longest as I don't think we will run any experiment before middle of the year and although we have the budget for an MS instrument that will take even longer to get going. In any case, I am excited about the new beginning here.  

Thursday, June 10, 2021

A not so bold proposal for the future of scientific publishing

Around 15 years ago I wrote a blog post about how we could open up more of the scientific process. The particular emphasis that I had in mind was to increase the modularity of the process in order to make it easier to change parts of it without needing a revolution. The idea would be that manuscripts would be posted to preprint servers that could accumulate comments and be revised until they are considered suitable for accreditation as a peer review publication. At the time I also though we could even be more extreme and have all of the lab notebooks open to anyone which I no longer consider to be necessarily useful.

Around 15 years have passed and while I was on point with the direction of travel I was very off the mark in terms of how long it would take us to get there. Quite a lot has happened in the last 15 years with the biggest changes being the rise of open access, preprint servers and social media. PLoS One started as a journal that wanted us to do post-publication peer review. It started with peer reviewed focused on accuracy, wanting then to leverage the magic of internet 2.0 to rank articles by how important they were through likes and active commenting by other scientists. The post-publication peer review aspect was a total failure but the journal was an economic success that led to the great PLoS One Clone Wars with consequences that are still being felt today - just go and see how many new journals your favourite publisher opened this year.

The rise of preprint servers has been the real magic for me. We live in each others scientific past by at least 2 years or so. If you sit down and have a science chat with me I can tell you about all of the work that we are doing which won't be public for some 2 years. If I didn't put our group's papers out as preprints you would be waiting at least 6-12 months to know about them. Preprint servers are a time machine, they move everyone forward in time by 12 months and speed up the exchange of ideas as they are being generated around the globe. If you don't post your manuscripts as preprints you are letting others live in the past and you are missing out on increased visibility of your own research. 

Preprint servers also serve the crucial need to dissociate the act of making a manuscript public from the process of peer review, certification as a peer-reviewed paper and dissemination. This is important because it allows the whole scientific publishing system to innovate. This is needed because we waste too much money and time on a system that is currently not working to serve the authors or readers efficiently. 

So after nearly 15 the updated version of the proposal is almost unchanged:

I no longer think it would be that useful to have lab notebooks freely available to anyone to read. There are parts of research that are too unclear and I suspect that the noise to information ratio would be too high for this to be of value. However, useful datasets that are not yet published could be more readily made available prior to publication. Along these lines, the ideas in the form of funded grant proposals should be disclosed after the funding period has lapsed. As for the flow from manuscript to publication, the main ideas remain and the system already exist to make these more than just ideas. There are already independent peer review systems like Review Commons. Such systems could eventually be paid and could lead to the establishment of professional paid peer reviewers. Such costs would then be deducted from other publishing costs depending on how the accreditation was done. Eventually "traditional" publishing could be replaced by overlay journals, like preLights, whose job would be to identify peer reviewed preprints that are of interest to a certain community.  

Social media for me has been the most surprising change in scientific communication. I didn't expect so many scientists to join online discussions via social media. Then again, I didn't foresee the geekification of society. In many ways social media is already acting as a "publishing" system in the sense of distribution. Most of the articles I read today I find through twitter or Google Scholar recommendations. As we are all limited by the attention we can give, I think one day, instead of complaining about how impact factors distort hiring decisions we will be complaining about how social media biases distort what we think is high value science. 

So finally, what can you do to move things along if you feel it is important ?  If you think we have too many wasteful rounds of peer reviewing across different journals; that the cost of open access publishing is too high or even simply that publicly funded research should be free to read and openly available to mine ? Then the best single thing you can do today is make your manuscripts available via preprint servers.

 

Friday, May 21, 2021

Lab move to ETH Zurich, the job search and fixed term PI positions

ETH Zurich (credit

Next January, after 9 years at the EMBL, I will be joining ETH Zurich as a tenured faculty of the Department of Biology with my research group hosted at the Institute for Molecular and Systems Biology (IMSB). I am really excited about this move and I think the IMSB is a perfect fit for the type of research that we do. We primarily use computational approaches to study the relation between genotype and phenotype with a specific focus on post-translational regulatory systems (more on the EBI website or my GScholar page). IMSB has a long tradition of method development in large scale measurements of biological systems with a current interest in mechanistically explaining trait variation. The smaller experimental component of our group uses yeast genetics which is also a great fit for the groups around including our future neighbours in the Institute of  Biochemistry. Research wise the group will remain focused on: studying the evolution and functional importance of post-translational regulation; determining the regulatory networks of a cell, and how they change under different conditions including disease. More broadly we also study the mechanisms that underlie trait variation across individuals of the same species. In terms of methods it will remain primarily computational with around 30% of the group devoted to lab work. The lab will be fully equipped for large scale yeast genetics with the exciting addition of having funding for a MS instrument for the proteomics. 


Teaching, scientific integration and group structure

With any move there is always some thoughts about the challenges ahead. Professionally, the types of things on my mind are that I will need to setup the group, integrate myself scientifically and prepare myself for teaching. Setting up the group and integrating myself within the local environment won't be new experiences. I feel I was too slow with both of these things when I first joined EMBL-EBI so I am curious if I will be able to move things along faster this time. Coming from EMBL and the local EBI/Sanger campus I have the impression that ETH is less collaborative but there were clearly many people interested in collaborating just from the small sample I got during interviews. There is an interesting difference in group structure between EMBL and ETH where at ETH a group can have sub-groups with junior PIs that can have varying degrees of independence as per the decision of the more senior PI. Organising a lab in this way will be something new. Finally, I will have to teach at the undergraduate level for the first time. I have always said that students coming out of biology or related topics need to have better training in bioinformatics. While daunting this will be my chance to contribute to this training directly.  

The interview process and decisions

For those less familiar with the EMBL, group leaders are hired for a maximal period of 9 years with only a few exceptions (around 10%) that end up having an open-ended contract. We get generous core funding and get to tap into a great scientific network which more than compensates for the lack of tenure. This means that around year 7 your thoughts start moving into the future. At faculty presentations I would often write how many years I had left in the tittle slide as a personal reminder.  Towards the end of year 7 I started applying and spent most of year 8 applying and interviewing. The first time I applied for PI positions it was all very unidirectional, with myself looking broadly for possible places. This time it felt more like dating a potential future university/institute with expressions of interests on both sides. One of the issues in going into this is that I didn't really know what my value would be in the market. I knew I had a good CV and would certainly find a job, I just didn't know where I could aim for in terms of seniority and resources. That become clearer only after the first interview and the expression of interest of places I felt were really fantastic. 

The second half of 2020 became then about trying to find the best place professionally and personally. I ended up applying to 10 places, interviewed in 8 and received 5 offers. I tried to find a job in my home country (Portugal) but from the two places I was interested one picked another candidate and the other could not make an offer that was not fixed term. The decision ended up being among 3 places with the major differentiation factor being between 2 offers that had less core funding but higher management responsibilities and ETH with incredibly generous core funding and the best scientific fit (but less seniority). Personally the decisions were about staying in the UK or moving to France or Switzerland. There is quite a lot to be said about this choice (safety, adventure, integration, kid friendly, jobs for partner, etc) and in the end we went with Switzerland. While excited I am also anxious about yet another move to what will be my 5th home country, the now almost familiar sense of uprooting and new beginnings. But this is not yet time for goodbyes.

Non-tenure group leader positions (in Europe)

I don't know who invented the fixed term, non tenure track, group leader positions in academia. It may have been EMBL and this model has clearly spread across Europe with many research institutes having some form of junior positions that have a variable number of years (5 to 12) to set up a group and then necessarily need to move on to a different place. EMBL does this because it is funded by many member state countries to train the next generation of "academic leaders" that will lead research groups across the member states. The obvious advantage of hosting these positions is that it keeps the institute forever young if you manage the turnover well. I think these positions can work well if they remain a relatively small proportion of the total PI/faculty positions; there is some level of support to at least kick start the group; and the positions last a sufficient number of years. Having gone through this at EMBL my impression is that 7 years would be the bare minimum and 9-10 years would be ideal. This also depends on the level of support beyond the PI salary. If conditions are not met then it is not worth setting up people for failure with the selfish goal of using the higher turnover to bring in new ideas/methods. Don't give people super postdoc positions for 3-5 years with no funding and no chances of tenure just because you want fresher ideas around. If there is some mechanism for tenure or open ended contract then it should be crystal clear from the start how (un)likely this is and what are the transparent criteria for achieving it.

Friday, January 29, 2021

State of the lab 7 & 8 - The last years at EMBL

This is usually part of a yearly series of posts where I note down thoughts related to managing a research group in academia over the years. This post covers years 7 and 8 and it brings me now to the start of year 9, my last at EMBL. While I usually do one of these posts every year, with all of the craziness of 2020 I ended up skipping one. 

Year 7, group turnover 

2019 was the year where the group fully turned over all lab members that were with us since the earlier years with 2 postdocs (Haruna Imamura and David Ochoa) and 3 PhD students (David Bradley, Claudia Hernandez-Armenta and Marta Strumillo) leaving. Haruna is now a Research Scientist at the Systems Biology Institute in Japan, David O is a the platform coordinator at Open Targets and Claudia and David B are now doing postdocs. Marta is finding her way through consulting. We were joined by 2 postdocs (David Burke and Miguel Correa) and 2 PhD students (Eirini Petsalaki and Rosana Garrido). This constant turnover of group members is quite difficult to manage both personally and professionally. Year 7 was really the year with largest amount of changes in the group and there is something to be considered about trying to make sure that changes remain gradual. However, it is not always possible to plan for this to happen. While I think that this change in academia is generally positive for science, I do wonder what could be achieved if this was not a requirement (see earlier post).   

Managing research focus over the years

Over the last few years, the research in the group had some dispersion in terms of the group research topics. At the start, the group was named "Evolution of cellular interactions" with a primary focus on the evolution and functional relevance of protein phosphorylation. While this remained the central focus there were other areas we worked on including cancer genomics and genetics of human disease and microbial trait diversity. We also have work that is not yet visible on drug mode-of action predictions. This led me to change the group name to "Cellular consequences of genetic variation" which could better serve as umbrella to the different topics. This is, at least in part, a simple reflection of funding opportunities but also a reflection of true movement in my research interests and the environment I have been working in (Genome Campus). On one hand I feel this dispersion is detrimental in that we could do more with a single minded focus, but on the other hand these extensions have not really been the majority of our work and also act as way for the group to explore new directions. My visual reference for this is a cell sending out protrusions in some directions to feel out the environment around. On some of these new areas (e.g. microbial trait diversity) I feel we have done enough, even with a small total investment, to make the work stand on its own. 

I have to say that the without explicitly planning for it, the dispersion worked to my advantage when applying for position last year as it allowed me to present the group through slightly different lenses depending on where I was interviewing in. Of course, this is only beneficial if there is sufficient research progress made by the group not to appear superficial or unfocused. I suspect that this movement in research topics is normal but I haven't had many deep conversations with others about how this has happened to them in their research groups. In some cases, the changes in topics for some groups seem more abrupt from the outside but it could be just a perception. I will soon have an opportunity to rethink where we put most of our research efforts and likely cut back on some of these extensions. 

Year 8 - A new group, the pandemic and the job market

At the start of last year, I was finally getting comfortable with the idea that the group had changed so much and I was truly excited about the new beginning. Just as the year was starting and I was enjoying this excitement the pandemic hit. As I had described before, we ended up devoting some effort in the group to work on SARS-CoV-2 projects which I think was also good for group morale. However, the changes in working conditions, the effort on the SARS projects and my need to go back to the job market made me less capable of keeping up with some of the projects in the group. While most of the work has kept going there are at least 3 projects/manuscripts that have been neglected simply for my own lack of time/effort. We all know these stories of PIs that let work pile up on their desk and I feel it as a failure although I can rationalise why I really didn't have the time to fully keep up. 

Finally, over last year I was fully back on the job market and I am so relieved that this is now over. Since there nothing official that I can announce I will wait to write up in detail what the process was like and compare it to my first attempt to secure a PI position. I can at least say that I will leave EMBL-EBI at the end this year and I will certainly write more about the 9 years of EMBL. I do want to look back to all that has been good (mostly) and bad, make a summary of what I feel were the biggest advances we made, perhaps discuss the finances, and more broadly go over the issues of this lack of tenure for junior PIs now implemented in so many European research institutions. 



Friday, December 04, 2020

A year of SARS-CoV-2 research

This post may be premature but I feel like writing down some thoughts about the roller coaster that this year has been. At the start of the year, with the number of reported cases rising in Europe the EMBL and our institute (EMBL-EBI) decided to send everyone home as precautionary measure. As most of our group is computational, this has meant we have been working from home for most of this year. Early on, somewhat frustrated by not being able to help, I emailed a few people that could be working on the virus. Nevan Krogan replied saying our help would be useful and we joined the global effort to contribute to solving this crisis. 

Science at science fiction speed

Over the course of 9 months we took part in 4 projects, some of these being the most thrilling science I have ever taken part in. We condensed what would easily be a 3 to 5 years research project into something done in 3-4 months, involving typically 10-20 research groups with a few key people helping to direct the research. We were collecting data, analysing and suggesting new experiments in the span of days with some of the best scientists in the world. Contributing to the direction of this level of resources has been an amazing experience that I wish every scientist could try at least once in their life. These projects were all geared towards studying how SARS-CoV-2 takes control of its target cells to be able to suggest human targeting drugs that could counter the infection. Several of the compounds identified in these studies are in clinical trials for COVID-19 so I feel the projects met their main objective. 

While this has been my perspective from working on these specific projects we are all aware of the amazing scientific progress that has been made over the course of this year. I remember seeing the movie Contagion and almost laughing at the unrealistically fast pace of research in the movie. However, SARS-CoV-2 research has in fact happened at an incredibly fast pace that probably matches the movie.

Why don't we do this for disease X?

One discussion point that has come up often is if we can learn from this period to apply it to research into other diseases. Science is an international endeavour but the degree of collaborations for SARS-CoV-2 research has been higher than usual. The effort put into this was also high among the projects I have seen personally but this eventually results in some exhaustion and it is not sustainable. I don't think this is easy to repeat for other diseases without the same external sense of urgency. Most scientists won't just drop what they are working on to fully focus on some other research question. Maybe it is an argument for even higher degree of collaboration, in particular between academia and biotech/pharma. There may be some small increase in productivity of collaborations through the use of online tools like slack and zoom but overall I don't see that the way we do science has been dramatically changed going forward. 

The case for higher spending in research

I'm gonna have to science the s**t out of this
Jeremy Farrar has often said that science is our exit strategy for this crisis. From testing, tracking the spread, to treatments and vaccines. It is this single minded effort of so much of the worlds research capacity that will lead to a long lasting solution. This already looks to be within reach with some treatment options, new ways of testing and critically, what appear to be effective vaccines. Soon enough we will be looking back and asking ourselves if there is something we could have done better. As trained scientists our reflex is to pause and think carefully about all the things that could have worked better. Were we efficient ? Did we deal well with the deluge of studies ? Was the peer-review too shallow and quick? It is our instinct to be critical but maybe we should be more vocal about how amazing the response of the scientific community has been. More importantly, this is the time to demand higher funding rates. If society can't see how important science is during a pandemic, when are we going to make our case ? This is the capacity of a research infrastructure that is funded by 1-2% of national budgets, what could humanity achieve if we were to double it ? 

Over the last 10 years academic science budgets have been squeezed and a lot has been said about how academic science needs to be more applied and how much we should justify the investment it is being made. This week, DeepMind, a private research institute funded by what is essentially an advertising company (Alphabet/Google) has made headlines with their impressive research into predicting the structure of a protein from its sequence. An advertising company finds the money to invest into what are fundamental biological problems and in the middle of a pandemic that is being solved by a global scientific infrastructure we can't get the EU science budget to increase. We should be ready to make our case over the course of the next months. 


Thursday, May 30, 2019

PlanS, the cost of publishing, diversity in publishing and unbundling of services


 A few days ago I had another conversation about PlanS with someone involved in a non-profit scientific publisher. I am still sometimes surprised that these publishers have been very much reacting to the changes in the landscape. In hindsight I can understand that the flipping of the revenue model to author fees has been threatened for a long time but always seemed to be moving along slowly. Without going into PlanS at all, the issue for many of the smaller publishers is that they simply cannot survive under an author fee model because their revenue from the subscription would translate to an unacceptable cost per article (given that they reject most articles). These smaller publishers typically use their profit to then fund community activities (e.g. EMBO press). The big publishers will do just fine because they have a structure that captures most articles in *some* journal so their average cost per article would end up being acceptable in a world without subscriptions.

I don’t want to go into the specifics of PlanS at all but I see clearly the perspective of the founders and wider society of wanting to have open access and even reducing the costs of publishing. The publishers have been given quite a lot of time to adapt and maybe some amount of disruption is now needed. One potential outcome of fully flipping the paying model might be that we simply lose the smaller publishers and consequently lose also their community activities if they can’t find alternative ways to fund them. There are enough journals in scientific publishing that, to be honest, I think the disruption will not be large.

Less publishers means less innovation in publishing


What I fear we will lose with the reduction in the number of publishers is the potential to generate new ideas in scientific publishing. Publishers like EMBO press, eLife and others have been a great engine for positive change. Examples include more transparent peer review, protection from scooping, cross-commenting among peer-reviewers, checks on image manipulation, and surfacing the data underlying the figures (see SourceData). While this innovation tends to spread across all publishers it is not rewarded by the market. Scientific publishing does not work within a well-functioning economic market. We submit to the journals that have the highest perceived “impact” and such perceived impact is then self-sustaining. It would take an extraordinary amount of innovation to disrupt leaders in the market. For me, this is a core problem of publishing, the fact that the market is not sensitive to innovation.

To resolve this problem we would have to continue the work to reduce the evaluation of scientists by the journals they publish in. Ideas around alt-metrics have not really moved the needle much. Without any data to support this, my intuition is that the culture has changed somewhat due to people discussing the issue but the change is very slow. I still feel that working on article recommendation engines would be a key part of reducing the “power” of journal brands (see previous post). Surprisingly, preprints and twitter are already working for me in terms of getting reasonable recommendations but peer-review is still a critically important aspect of science.

Potential solutions for small publishers


Going back to the small publishers, one thing that has been on my mind is how they can survive the coming change in revenue model. Several years ago I think the recommendation could have been to just grow and find a way to capture more articles across a scale of perceived impact (previous post). However, there might not be space for other PLOS One clones. An alternative to growing in scale would be to merge with other like-minded publishers. This is probably not achievable in practice but some cooperation is being tested, as for example in the Life Science Alliance journal. Another thought I had was then to try to get the market to appreciate the costs around some of the added value of publishing. This is essentially the often discussed idea of unbundling the services provided by publishers (the Ryanair model?). 

Maybe the most concrete example of unbundling of a valuable service could be the checks on non-ethical behavior such as image manipulation or plagiarism. These checks are extremely valuable but right now their costs are not really considered as part of the cost of publishing. Publishers could consider developing a package of such checks, that they use internally, as a service that could be sold to institutions that would like to have their outgoing publications checked. Going forward, some journals could start demanding some certification of ethical checks or funding agencies could also demand such checks to be made on articles resulting from their funded research. Other services could be considered for unbundling in the same way (e.g. peer review) but these checks on non-ethical practices seem the most promising. 

(disclosures: I currently serve on the editorial board of Life Science Alliance the Publications Advisory Board for EMBO Press)