Monday, June 08, 2026

Looking back at the rise of the internet to gauge the impact of AI-assisted scientific research


There is a lot of debate and some hyperbole around the impact of AI-assisted scientific research. When considering the future impact of general purpose technology, I thought it made sense to look back at the rise of the internet as the last clearly transformative and general technology that swept through biomedical research. Is there any strong record of how the internet actually changed scientific productivity, or how we do science ? This comparison could be a useful reference point for thinking about AI-assisted research.


To be clear, when I say AI-assisted research, I have in mind AI models autonomously doing the work of a bioinformatician. Compiling and harmonising public datasets, devising and running analyses, making figures, and in some cases developing novel computational methods. I recently wrote about trying exactly this with Claude Code on a small discovery project. This is different from AI as a scientific instrument, in the way that AlphaFold predicts structures. These specialized models are of course also quite important but not the focus. Here I mean the AI model assisting a lab member doing their work in bioinformatics.

Looking back: what the numbers say

The crudest possible metric of scientific productivity is the number of papers published per year, which has been growing quickly. Global output has been expanding somewhere between 4 and 9% per year for decades, doubling every nine to seventeen years depending on estimates. On its own this number says very little. The population of scientists has also grown quite a lot, and while it is harder to get good statistics on this, UNESCO has an estimate of an increase of 9.9% in the number of researchers per million inhabitants from 2014 to 2018.


One study that I looked at quantified the number of articles per author, tracking authors with less ambiguous names. Once you adjust for the growing number of co-authors on each paper, the publication rate of an individual scientist has not increased over the last century. Papers per author is still quite crude and likely misleading, because the publishing unit has also changed with time. One study by Ron Vale, compared papers published in a group of journals between 1984 and 2014 and found that the amount of data per paper, measured as distinct experimental panels, rose two to fourfold. Supplementary material went from non-existent to matching or exceeding the main text.  The number of authors per paper rose two to fourfold, and the time for a PhD student to publish a first paper went up by more than a year. Unfortunately, this study was not done per decade so we can’t really see how continuous this trend has been.


Some of this increase in paper “complexity” may reflect the fact that "data not shown" is now shown, and the general increase of what reviewers and editors demand. Nevertheless, it is clear that the amount of data per paper has certainly increased with time.  This idea of the amount of paper content or complexity relates to another debated topic which is the degree of disruption. There is a well-known and often quoted paper describing that papers have become less disruptive over the decades. Reading more about this, this work has been strongly criticized on its methodology including issues of citation inflation, which has driven the increase of reference lists over time. Different studies have reported an opposite trend of increase in disruption index over time.


So in summary, the clearest signal over time is not a jump in the number of papers per person, but a rise in the complexity of projects. More data, more methods, more co-authors, more interdisciplinarity, and more time to assemble a publishable unit. Whatever productivity gains the last few decades brought seem to have been spent largely on making scientific papers bigger and more involved.

Bioinformatics as a possible discontinuous effect

The most visible discontinuity, co-occurring with the rise of the internet, that I have come across was the rise of bioinformatics. A 2023 study in Advances in Complex Systems describes a discontinuous increase in research work that combines biology and informatics research around the time the internet was introduced. That matches my intuition but, while the internet certainly enabled it, it is unclear if it was the main driver as there was a parallel increase in measurement throughput and genomics. Networked biological databases, public repositories, and tools like BLAST only make sense once you have a network and a shared dataset to search against. The internet did not create bioinformatics, but it made this way of doing science possible and made purely computational groups feasible. So one framing of the impact of the internet on scientific output may be less of a discontinuous jump in productivity and more that it enabled a specific way of doing science (i.e. bioinformatics), which grew quickly, on top of a gradual rise in the general increase in complexity of scientific projects.

Why is the productivity gain not more obvious?

If the internet was so transformative, why is there no clean step in the productivity numbers? This is not just true for scientific productivity, this was generally true in the broader economy. There is a rabbit hole of information around this in economics and I am no expert. The sort of arguments I see recurrently about some of this includes the idea that technological diffusion takes time and that it requires complementary investments. This is often discussed as the Solow paradox. The other idea that I have seen often is that many improvements get smoothed out. Aggregate productivity is the net result of many overlapping advances arriving continuously, so no single one appears as a clean jump in productivity.


For scientific research, the gain in capabilities might be quickly offset by an increase in problem complexity or more simply the increase in demand for what constitutes significant scientific advancement. The internet’s impact on scientific knowledge work or the drop in  DNA sequencing costs do not change the fact that bringing new drugs to the market keeps costing more money. Looking back, the internet might have contributed to a very clear rise in bioinformatics, and to a more continuous trend in increased overall productivity, which may be partly hidden by diffusion lags, smoothing across multiple technologies, and a rising bar in demand for what should constitute a publishable unit.

What this suggests for AI-assisted research

We only have early assessments of AI's effect on research, and the most solid of them point to clear but not overwhelming productivity gains on coding and writing tasks. This matches my own experience. The speed-up in writing code, dealing with IT, plotting, and drafting is real, even though the outputs still need careful expert verification.


Looking back at the rise of the internet suggests that the gains in coding and writing alone, at the current state of the models, are unlikely to show up as a clear discontinuous jump in scientific productivity. Coding and writing are only part of the research process, the outputs need checking, and the same diffusion, smoothing, and rising-bar on what constitutes an advancement still apply. If the internet and bioinformatics are any guide, we should expect a gradual effect, possibly accompanied by the emergence of new modes of doing science, rather than a sudden step in the numbers.


Compared to the internet, the diffusion of the technology is likely to be far faster this time. End-user adoption is essentially unrestricted, in that anyone can use these tools today with no infrastructure to build on the side of the end-user (i.e. we all have computers already). It is unpredictable whether further increases in model capability change the picture. If outputs become reliable enough to trust without expert validation, or models have better research taste, the outlook could be different.


The more interesting question, to me, is whether agentic AI changes how research is done rather than only how fast. The one easy prediction is a greater capacity to explore research directions quickly and to run several projects in parallel. Rapid prototyping of ideas, de-risking exploratory work, and one person keeping several independent lines of research going at once. That is a change in mode, much as bioinformatics was, and it is exactly the kind of change that crude productivity metrics won’t easily register.