Wednesday, May 30, 2007

Presenting Blog Citations

Recently Postgenomic hit the 10k mark. Ten thousand citations to papers and books have been tracked in science related blogs. In the post announcing the milestone, Euan asked if blog buzz could be an indication of impact of a paper. Can science bloggers help to highlight potentially interesting research ?

I decided to have a look at this and asked him to send in a list of papers published in 2003-2004 and mentioned in blog posts. For these I took from ISI Web of Science the number of citations in papers tracked by ISI (all years). There are 519 papers published in 170 journals in the period of 2003-2004 that were mentioned in blogs tracked by Postgenomic. Of these, 79 papers could not be found in ISI. Many of the papers not found in ISI were published in arXiv. These 79 were no longer considered for further analysis.

Top cited journals in blog posts

I ranked the journals according to the incoming blog citations. The top 5 are highlighted below, and apart from arXiv, that is not usually tracked as a journal (maybe it should), the other 4 are all known journals publishing in general science/biology. Comparing to impact factors there is a noted absence of review and medical journals. This measure of blog citations (instead of blog citations per article) will penalize low volume journals like the Annual Review series. Regarding the low blog impact of medical journals, maybe the current journal ranking by blog citations reflects a higher proportion of biology and physics blogs currently tracked by postgenomic.

Relation between blog citations and average literature citations

The fact the bloggers tend to cite research published in high-impact journals could be just due to the higher visibility of these journals. To test this, I analyzed the average citation per article from papers published in 2003-2004 in any journal with more than 1,2 and 3 blog citations (see table below). I compared it to papers published in Science and Nature in the same period. It is possible to conclude that: 1) papers mentioned in blogs have a higher average citation than those published in these high impact journals: 2) papers with increasing blog citations have on average a higher number of literature citations.

Journal Papers in 2003-2004 Citations Average citation per paper
Science 5306 148912 28.06
Nature 5193 145478 28.01
>0 blog citations 440 21306 48.42
>1 blog citations 71 3679 51.81
>2 blog citations 24 1835 76.45
>3 blog citations 15 1557 103.8

I did not remove non-citable items (editorials, news and view, letters, etc) from the analysis. It would hard to come up with criteria for removing these from both the journals and from the papers tracked by postgenomic. In any case, I suspect that bloggers tend to blog a lot about of non-citable items because these are usually more engaging for discussions than research papers. Therefore if anything I suspect that the real measure of impact for blog cited items should be even higher.

Our global distributed journal club

In recent years science publishers have worked to adjust to publishing online. Most of them now offer RSS feeds for their content and some timidly started allowing readers to comment on their sites. With the exception of BioMed Central none of the publishers make of point of prominently showing these comments, making it harder to find out about interesting ongoing discussions. This has not stopped researchers from participating on what can be called a global distributed journal club. As Euan and others have nicely noted, scientists are using blogs to discuss research. It is a very diffuse discussion but it can be aggregated in way that it could never be possible if we kept to ourselves, in the usual conferences or in our institutes/universities.

I tried to show here that this aggregated discussion conveys information regarding the potential impact of published research. This is only the tip of the iceberg of the potential benefits of aggregating and analyzing science blogs. For example, it should be possible to look for related papers from the linking patterns of science bloggers; the dynamics of communication between different science disciplines; the trends in technology development, etc.

Some publishers might be thinking of ways to reproduce these discussions in their sites. One alternative would be for science publishers to get together in the development of the aggregation technology. There should be an independent site gathering all the ongoing comments from blog posts and from the publishers' websites. This could then be used by anyone interested in the information. It could be shown next to a pubmed abstract or directly in the publishers website. Right now this would likely be the single biggest incentive to online science discussions that science publishers could do.