Thursday, April 09, 2026

State of the lab 13 - Our slow adoption of deep learning methods and the future of AI for research

This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes year 13, the 4rth year after moving to ETH Zurich. I will leave it to the end of the 5th to write a scientific report about our work in the first 5 years together with revision of our future plans for the following 5 years. This year I wanted to look back at the impact of deep learning in the work of my group. Why I was so slow to even acknowledge the value of deep learning models; how we struggled to try to integrate some of these modelling approaches and a more general reflection about fear of missing out (FOMO) and adaptation to technological changes. This is quite a long post, skip to the last section if you just want some thoughts on the current state of general AI for research.

A slow realization of the importance of deep learning methods

I was quite slow to realise the value of deep learning methods. Neural networks (NNs) have been around ever since I started doing research. I am biochemist by training and only learned about ML during my PhD. I stumbled onto NN when working on domain-peptide interactions around the early 2000s, where there were some NN models to predict specificity, including the work of Søren Brunak's lab on things like SignalP and NetPhorest. What I missed was the transition between user defined features and the idea that large NNs can create their own features during training. That realization only came around 2016, in part influenced by this review article published in MSB. However, my research group is not a method development group, we do a lot of bioinformatics but mostly as applications. In this context, earlier deep learning models were often just as good as approaches with fewer parameters. I was curious about the concept of how large NNs could learn features that matched the kinds of features that we would engineer but, for applications, this black box in the feature space is a hindrance. Our adoption would likely have been faster if we worked in image analysis, where deep learning made early, significant advances.

AlphaFold and FOMO on deep learning method development

Deep learning models kept making progress and the publication of AlphaFold2 in 2021 was a critical turning point for many scientists. We have been using protein structures as an *omics resource for many years. The idea that we could cover a large fraction of the proteome and some protein interactions with predicted structural models predates AlphaFold, including work done at EMBL by Patrick Aloy and Rob Russell, among many others. AlphaFold was much closer to our work and it was a clear example of NNs strongly outperforming other methods. As a group that is more focused on applications, using deep learning methods is similar to what we have been doing anyway, except that the process of verification of the method is harder. It requires more effort as a user to verify that DL methods can be applied in the domain of interest. We need to consider carefully what the model was trained on and test for generalization. As an example, we have found many issues with protein language-based protein interaction predictors that perform poorly when we test them.

We have been having a lot of fun applying AlphaFold2 and 3 to all sorts of different problems, but post AF2 release, I had a strong period of fear of missing out, seeing groups developing deep learning methods to protein sequence and structure. There are always these periods when new technologies come around and we have to make decisions on whether to adjust the group capabilities to them. I made no effort to adopt single cell approaches and I am generally happy to have made that decision so far. Deep learning is not as easily ignored, but I often resisted hiring someone with a deep learning method development background, mostly because we are not really a method development group at our core and we would have a hard time competing in this space.

Uptake of deep learning in the group and their general issues

Despite some reluctance on my part, we have been gaining deep learning expertise, partly through hires, partly through training of existing lab members that have tried out some approaches. At ETH Zurich, we created a block course on deep learning applications to biology, where we teach groups of 15 (mostly biology) bachelor students how to train their own deep learning models. This has forced me and some lab members to know enough about basic principles of deep learning to teach them at this level. Over the past few years, we started implementing our own fairly simple models. I still have issues with the use of deep learning, given the lack of interpretation. In fact, one of the most interesting projects we have been working on is about issues with so-called biologically inspired or “visible” neural networks that we hope to preprint soon. The biggest concern I have is around the necessity of evaluating models primarily through verification, which often comes at the expense of understanding. Even in a simple case of supervising the latent space of an autoencoder, we can measure performance improvements and be careful about generalization, but I wonder if most researchers try to investigate the latent space transformations to understand them.

The wider question of using general AI models for research

Beyond the application and development of deep learning models in biology, we now have the exciting and frightening developments in general AI models. This is around the notion that general AI models (Claude, Gemini, ChatGPT) can be orchestrated to do complex tasks over longer time horizons. There have been a few examples of such “AI scientist” methods that I feel, so far, are mostly hype and concept. However, in the tech world, something did change at the end of last year. The most recent models have become good enough at programming that it does seem like it might be a matter of skill and the right set-up. For research in bioinformatics, I can imagine two very different ideas. One is closer to software development, where we have some input data and a quantifiable/verifiable outcome. The second is more open-ended research, where we may combine a few datasets and we need to explore the data to test an hypothesis or simply find patterns in the data. For the first, the tech world is clearly heavily invested in and there is a proliferation of such agent orchestration tools (e.g. AutoResearch, Gas Town). For the second, it is more about giving agents information about tools (e.g. differential expression analysis) and datasets (e.g. TCGA data). Gummi, a PhD student in our lab, tried out a few tools around this space with the current best example of this is biomni. As PI, I am anxious about these developments. I think we are past the point when we don’t even need any new base model improvements for these approaches to be useful (but they are still coming anyway). I wonder if we should already be adopting these practices into our research even if it requires an upfront investment of time and training to change how we do research. FOMO again basically. Lab members use AI assisted programming but we didn’t try to implement some of the agent orchestration methods yet. Discussing these topics also seems to raise a lot of passionate and polarizing opinions and there is so much money being bet on this that it is also not easy to know when opinions are biased. The next few years will certainly be very interesting and I only wish we could hit the pause button on general AI method development to let society adapt to the changes.