Tuesday, August 30, 2005

A reflection on scientific journal publication by Philip Bourne

Philip Bourne wrote a very interesting perspective in the last issue of PLoS Computational Biology. He starts by comparing the somewhat convergent evolution of journal publishing and database submissions: "The daily work of any high-throughput scientific journal or biological database consists of information input, information processing, and information output." He also mentions the obvious difficulties of retrieval of information from scientific publications when compared to database data and raises some possibilities on what can be done to improve this. One would be to attribute digital object identifiers (DOI) to the items of content within biological databases. This would allow to track different publications referring to the same items to more easily retrieve the information from a paper automatically. Barend Mons wrote recently a commentary about exactly the same subject in BMC bioinformatics so there might be some grounds for agreement between some journals.
Phil Bourne also proposes that data in a publication should be more "alive". This reminded me of a recent discussion we had on Nodalpoint over reproducible research. Lastly Bourne also suggests that more data should be attached as meta data to the paper. As an example he suggests that gene names can be automatically retrieved and reviewed by the author with minimal effort and integrated with the paper as meta data. This point seems to me quite similar to first one, it would serve in essence to make it easier to retrieve information from a scientific publication.

It is nice to see the discussions coming up, now just find a way to get some journals together to agree on some formats and implementations. Make them opt-out, authors should do them or have to pay extra costs to have the publishing houses do it. Start making the papers on the web more connected to the databases, and vice versa. I want to click a protein name and have a list of possible databases to visit :). Add some kind of "trackback" to the databases, every paper mentioning a protein sends a trackback ping to the protein database and it is automatically updated.