Sunday, August 10, 2008

Post-publication journals

With the increase in the number of journals and articles being published every year and the possibility of having an even larger set of "gray literature" available online we face the challenge of filtering out those bits of information that are relevant for us.

Let us define as "perceived impact" this subjective measure of importance that some bit of information holds for us as scientists. This information is typically an article but it could be applied later to pre-prints and database entries in general.

Everyone of us creates some rules to select from the constant stream of scientific output what to pay attention to. We could picture this sorting process in the form a triangle with a large base of very specific knowledge that is somewhat important to us and a small amount of more general but highly important content at the top. For the majority of scientists today, these sorting rules are based on journal topic (cell biology, physics, evolution, etc) and journal impact factor. Below the base we could place the gray literature that today is mostly out of sight and is not peer-reviewed.

With the advent of the web and in particular the social aspects of this new medium we should expect better than evaluation of articles based on the quality of the journal that it was published in. In the words of Eugene Garfield, the inventor of the impact factor:
“In order to shortcut the work of looking up actual (real) citation counts for investigators the journal impact factor is used as a surrogate to estimate the count. I have always warned against this use”. Eugene Garfield (1998)
Scientific publishing is now digital with every article having an universal digital identifier (DOI). However, as an author I can get (for free) much more information about how people are using the content from this blog than for articles I published. Information about the number of downloads, citations in other articles, in scientific blogs or in bookmarking services could help us sort through information in a better way than relying solely on journal editors (impact factors). We should be using the social web to re-sort articles after peer-review to reflect our preferences:
How would we build such a personalized sorting system ? In the words of the chief-editor of Nature:
(…) nobody wants to have to wade through a morass of papers of hugely mixed quality, so how will the more interesting papers in such an archive get noticed as such? Philip Campbell

It is obviously challenging to use some of those metrics mentioned above as signals to rank the important of individual articles when they are so easy to game. On the other hand some of them are already useful and working today. I already subscribe to RSS feeds from some users of Connotea that consistently bookmark articles that I find useful. Similarly through FriendFeed I get recommendations of articles to read from people I trust. So, although I do not have a clear solution on how to build such a system I think there is a need for it and there are clear ideas to try.
Here is something like a mind-map of what I think would work best, a mixture of the social recommendations of FriendFeed with the pure algorithmic ideas of Google News:

These ideas of sorting based on measures of usage is already being tested by the new Frontiers journals. These are a series of open access journals published by an international not-for-profit foundation based in Switzerland. As PLoS ONE, these journals aim to separate the peer-review process of quality and scientific soundness from the more subjective impact evaluation. In practice they are doing this by publishing research in a tiered system with articles submitted to a set of specialty journals. The articles are evaluated based on the reading activity of the users and the top 10% advance up to the next tier journal.
So far Frontiers has started with neuroscience specialty journals with a single top tier journal (Frontiers in Neuroscience) but if this is successful they could easily add other disciplines and have a third tier on top of very general content. In order to contribute to the evaluation procedure, readers must fill out their profile. This information is taken into consideration since they will rank users usage metrics differently according to their expertise.

No single individual wants to go through all published literature to find the useful information but together we effectively do this. The challenge is how to evaluate specific articles by a combination of metrics to promote them to wider audiences in a way that is not easy to exploit. Kevin Kelly said recently in a Ted Talk that "The price of total personalization is total transparency". Would this bother scientists ? Lets say that a few science publishers get together with some of these scientific social sites (social networks, bookmarking sites) to mimic the Frontiers model in a larger scale. Users would install a browser plugin that would link their scientific profile and social contacts with their reading activity. The publishers could then use this information to create personal reading hubs for users.