Friday, July 27, 2007

Trade books vs Nature publishing

(via Richard Charkin blog) Richard Charkin is the Chief Executive of Macmillan (Nature Publishing is a subsidiary of Macmillan). He posted his thoughts on digital books are not as successful as the digital publishing going on at Nature.

I can't help noticing the second reason (my emphasis):
2. Scientific publishing has been intrinsically more profitable than trade book publishing. This allowed the major publishers and societies to invest the significant sums needed to create electronic delivery and storage platforms for scientific information. These platforms are a cornerstone for the creation of a new business and communication model.

and read it as "higher profit margins".
Google code for educators

(via the Google Blog) Google started a website to gather teaching materials for CS educators, covering some of the most recent technologies. Right now it has some material for AJAX Programming, Distributed Systems and Web Security. There are some video lectures and presentations. There is already some material on parallel programming (mostly related to their MapReduce) that should be of use to bioinformatics.

One a related topic Tiago has on his blog started a multipart series about "Bioinformatics, multi-core CPUs and grid computing". The first and second part are already available.

Tuesday, July 24, 2007

Slideshare adds voice

(via TechCrunch) Slideshare, a site to share presentations online has added voice synchronization. We can now provide a link to an mp3 file and Slideshare provides with some tools to sync the audio to the slides, such that each slide is linked to part of the audio track. More information and examples can be found in this FAQ page.

In related news, Bioscreencast has now a group in Facebook.

Saturday, July 14, 2007

Another Open lab book

(Via Open Reading Frame) Jeremiah Faith is given open notebook science a try and compiling some tips. He joins Rosie Redfield (microbiology) and Jean-Claude Bradley (chemistry) in exposing most of their research online and leading the way to changing the mindset towards open science.

Jeremiah Faith also has an interesting idea about using conference money to pay for advertisement. He figures that well targeted ads can get you more attention than a talk. He like the idea because it is thinking out of box but I think that the type of connection that one can create on a conference with other people is not so easy to recreate online. Also, there might not be any need to spend money on advertisement if the blogs keeps on topic and is interesting enough to get incoming links. The blog can be a good personal marketing tool.

Friday, July 13, 2007

Early response to PLoS ratings

PLoS ONE pushed out a rating systems in the latest update of their website. I though it was another quiet update but several announcements are now up.

The technical details were described by Richard Cave and Chris Surridge invites everyone to "Rate Early, Rate Often". Bora (that now works for PLoS) summed it up in a blog post as well.

And just because they make it so easy to query the data, here goes the stats 3 days after the announcement:
Number of papers queried: 611
Number of papers rated: 47
Number of ratings: 50
Ratings: Average - 75%; Max - 100%; Min - 40%

Top rated papers (all with 100%)
10.1371/journal.pone.0000288 (rated by: brembs)
10.1371/journal.pone.0000354 (rated by: brembs)
10.1371/journal.pone.0000439 (rated by: brembs)
10.1371/journal.pone.0000349 (rated by: Complexity_Group)
10.1371/journal.pone.0000351 (rated by: crusio)
10.1371/journal.pone.0000455 (rated by: crusio)
10.1371/journal.pone.0000123 (rated by: Damien)
10.1371/journal.pone.0000224 (rated by: Damien)

Maybe in the long run it would be nice to know if the user that rated is also an author in the paper :), or put a comment in the ratings suggesting that authors are not very good at evaluating their own work.

Number of users that have rated: 24
Top 3 users:
Chris_Surridge
Complexity_Group
jstajich

The lowest rating so far:
10.1371/journal.pone.0000257 (rated by:godzikc)

There is no point in trying to conclude anything from this. It was just for the fun of it. If I could make a small wish it would be have a similar way to query for the accumulated number of page views or visitors for a given DOI.

Wednesday, July 11, 2007

Open-source architecture to house the world

Here is a very energetic talk (filmed in February 2006) by Cameron Sinclair hosted at TED talks. He is part of the Architecture for Humanity organization that promotes architectural and design solutions to global, social and humanitarian crises. A very inspiring example of how internet really makes the world small and how ideas like crowdsourcing and the open access to innovation can make a difference. The first time I heard about a creative commons house design.

They have started a project called Open Architecture Network to serve as hub for collaborative efforts.

Tuesday, July 10, 2007

What is the $value$ of an editorial decision ?

(warning: random thoughts ahead)

From my viewpoint open access is doing great. PLoS has demonstrated that authors want to publish in open access journals and that these journals can quickly establish themselves as high impact forums for their respective audiences. BMC is set to show that open access can be profitable and within BMC some journals are are also trying to position themselves in the top tier of perceived impact.

How will BMC manage this and will PLoS and others find a way to serve the authors interest while keeping the direct costs to the authors within reasonable ranges (even if they are paid by the funding bodies) ? I can't really answer this :) but I do note a trend. Open access publishers like PLoS and BMC are increasingly publishing more and decreasing the rejection rates (when considering all that is published within the brand).

BMC has primarily focused on publishing high volume (peer-reviewed) articles without regarding to much on perceived impact in the field. I might be incorrect but more recently they have been trying to highlight a group of flagship journals (BMC Biology, Genome Biology and Journal of Biology) where they filter on perceived impact. They have even said that papers submitted to other BMC journals can even be suggested "up" if they are found to be of high impact.

PLoS on the other hand had the the exact opposite direction. PLoS started with their flagship journals (PLoS Biology and later PLoS Medicine), then created the community journals (PLoS Genetics, Computational Biology and Pathogens) and now opened PLoS ONE that will not filter on perceived impact.

On an author pays model, the most obvious way to limit the cost per paper and still provide a solid evaluation of perceived impact, is to have journals that cover the broad spectrum of perceived impact. In this way, for the publisher, the overall rejection rates decrease, the papers are evaluated and directed to the appropriate "level" of perceived impact.

Also, on closed publishers it is custom to be able to transition a manuscript with the peer-review comments from one journal to another of the same publisher. This practice is can be advantageous to everyone. saving the time of the another peer-review process.

Taking away the costs of editing and printing (online this can be very small) most of the costs of sustaining a science journal should mainly come from the editorial staff. So, what is the value of an editorial decision ? In other words, could there be freelance editors ? Could the editors be separated from the publisher ? Imagine I read a paper from a pre-print server, ask some people to peer-review (why would they?) and sell our evaluation to a journal.

Also, can a publisher sell the editorial decision to another publisher ? Lets imagine a journal that has a very high rejection rate, the editor asks referees for comments but ultimately the manuscript is rejected. The editor could then ask the authors where they want to send it next and offer to provide the referee report and editorial comments directly to the next journal to expedite the process. Could this journal get paid for this ?

Monday, July 09, 2007

User ratings in PLoS ONE

Another quiet update on the PLoS ONE interface. They have introduced an interface for user ratings. The overall rating can be seen in the right bar (when reading a paper on the site) and expanded to show a dissection into 3 categories: insight, reliability and style.

A click pops up a voting screen:



The nice detail is that we can query rating data by DOI. (example). It is not really an API, but the info is there and it is easily parsable. The PLoS ONE managing director, Chris Surridge, mentioned in the PLoS Facebook page, a couple of days ago that this change would be up soon.
Filtering papers on number of downloads

I was having a look at highly accessed papers for BMC Bioinformatics. In BMC, all journals have a page with the statistics of the most highly accessed papers of the last month. Several other journals now provide a similar service. The cool think about BMC is that they even tell you how many views per paper (sum of abstract, full text and PDF accesses on BioMed Central in the last 30 days). Not only that, the information in on the RSS feed they provide. That makes it very easy to feed into a pipe and have a threshold for number of views above which it will show up on the filtered feed.

Here is pipe example to filter out BMC Bioinformatic papers below 1000 views. The only problem is that the information is not stored as a number (example :"Number of accesses: 1226"). That is why I used a regular expression [1-9][0-9][0-9][0-9]$ instead of number filtering. I also don't know if the numbers are updated everyday .. but I hope so.

Even better would be to have some kind of service that given a DOI BMC would provide exactly this information structure. If other repositories provide a similar service then there is no point in worrying about the dilution in the number of page views because of open access because we could just sum views in the publishers site with Pubmed Central, etc.
Metadata infrastructure

Deepak and Neil blogged today about tagging and adding more structured metadata to the science web. I started by commenting to Deepak's post but it grew a bit so I changed it to a blog post.

The most obvious start for me would be to find a standard to communicate information on the perceived impact of a paper (extending hReview for example). It has a unique digital identifier and ways to resolve it but no way to communicate number of downloads at publisher site X, number of incoming citations in other papers, in blog posts, simple rating by users.

On the user side the blogging platforms, social network sites and wikis would need some way to add microformat support. See for example this plugin for wordpress (via F&L). If someone knows how to do the same for blogger please tell me in the comments. It needs to be something like clicking a button to link to a paper and out comes a formated hReview.

I think finding standards for manuscripts is a good start because a lot of people already tag and blog about papers. There is a lot of information to aggregate and a lot of interest in having a good measure of impact for individual papers. What we learn from putting this in place can later be used for other types of data communication (e-lab books). Another possible good start would be conferences and conference reports (related to hCalendar ?).

Of course, this would require the participation of science publishers. They are the ones best in place to set up the tools and expose some of the information in a structured way to help enforce a standard.

Saturday, July 07, 2007

Referee reports in Nature Precedings ?

I was having a look at some of the bioinformatics manuscripts available in Precedings and I come upon this paper on "The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies". After the figures there is a letter to the editor with the response to the questions from the referees.

I could not find the paper published in a peer-reviewed journal and I wonder if this was intentional of maybe part of an (opt-in and maybe buggy) automatic procedure from Nature to have submitted papers appear in Precedings. If I was an editor of a bioinformatics/genomics journal I could now consider if this paper with these referee reports would be interesting to the journal and send an email to authors suggesting that if by some chance their paper gets reject my journal would be willing to publish it.

Deepak was recently saying that it would be good to have access to this type of information. Why was a paper rejected from some journal and published in another? Most manuscripts go through several editorial and referee evaluations before getting published. Biology Direct and now PLoS ONE (too some extent) capture this information. I have found that many times it is useful to read the referee comments in Biology Direct because it provides with several independent criticism that makes it easier to home in on good and bad parts of the work.

I am sure that this has come up before in the context of ArXive but wouldn't it be more efficient to have journal editors somehow fish out from a common pool what is more interesting and hight impact to their community instead of the current submission ladder that i assume a lot of people go through ? We would submit to a preprint server and tag the paper according to perceived audience (i.e cell biology, bioinformatics, etc). Editors would flag their interest for the paper and the authors would select one of the journals. You can imagine some of the dynamics that this could create with some journals only looking at manuscripts that have already been flagged by some other journals, etc.

The referee reports would be attached the paper and the editor would make a decision. If rejected the paper would be up again for editorial selection but with the previous information attached. Other journals could just decide to publish with those referee comments.

I think this is not far from what already happens within publishing houses. Referee reports can be passed around to other journals of the same publisher. This would make it more general. Although there are clear advantages to authors (fewer rounds of refereeing and quicker publishing), it would be hard to convince most publishers to such a scheme. For those publishing mostly journals with low rejection rates it would be beneficial since most likely the papers have been already refereed, but for those with high rejection rates it could feel like they would be giving away their work for free. Since it is really the work of the referees maybe it should be up to the referees to decide if the reports can be made public or not, period.

Wednesday, July 04, 2007

RSS feed for BiomedCentral comments

As if there is not enough things to read this days I though it would be interesting to provide an RSS feed for BiomedCentral comments. I tried to use openkapow to scrape the information from the webpage but for some reason the feed only worked a couple of times after being published. Instead I used Dapper that amazingly enough produced a more stable feed. The full, unfiltered feed can be found here.

The feed includes the title (with a URL to the comment page, where there is a DOI to the cited paper), the short description provided in the main webpage and the journal (saved in the date). The feed can be filtered for particular journals using this simple pipe from Yahoo pipes that is currently set for BMC Genomics or BMC Bioinformatics.

Sunday, July 01, 2007

BioBlogs #12 and a blogroll update

The 12th edition of Bio::Blogs is out in Nodalpoint. It has been one year of monthly posts (mostly) about bioinformatics. Is anyone interested in hosting the next edition ?

Also, I updated my blogroll to reflect more what I am currently reading. Most updates are in the bioinformatics part but there are a couple of additions in all of them.
Bioscreencast and Multimedia@Harvard

Deepak and Harijay have posted about Bioscreencast, a project they were involved with. It is a repository for science related screencast. A screencast is video capturing the output of your screen usually with some audio narration to explain to the viewer what
it is being shown. They hope it will be used by scientists to share knowledge on how to use science related computer tools.

On a related note Ricardo posted a nice review on multimedia sites for science. He linked to this amazing video of the inner working of the cell:


You can see the full (narrated) version of this movie along with other science media files in this Multimedia Production Site at Harvard.

Wednesday, June 27, 2007

Call for Bio::Blogs #12

I am collecting submissions for the 12th edition of Bio::Blogs. Send in links to blog posts you want to share from your blog or that you enjoyed reading in other blogs to bioblogs at gmail until the end of the month. The next edition will be up at Nodalpoint on the 1st of Jully.

Maybe it could be cool to try out a section on papers of the month as voted by everyone (Neil used to do this once in a while). Anyone interested in participating just has to send a link to a paper, published last month and related to bioinformatics, with a short paragraph explaining what is nice about the paper.

Mike over at Bioinformatics Zen is asking how to continue the Tips and Tricks section of Bio::Blogs. He has put up a wiki page on open science in Nodalpoint to collect information for a possible future edition of the special section.

Monday, June 25, 2007

Synthetic Biology 3.0

I am not attending the 3rd edition of the Synthetic Biology conference but there are several bloggers attending and reporting.

The Seven Stones

Nature Newsblog (part I and part II)
The ETC blog (intro and part I)

Thursday, June 21, 2007

Structures in Systems Biology (a double bill)

Once in a while I get to write about what I have been working on. The last time it was about the evolution of protein interaction networks. This time it is about two papers that I contributed too. A review about the use of structures in systems biology and an article about structure based prediction of Ras/RBD interactions. I am sorry to say that both require a subscription (pedrobeltrao *at* gmail).

Main conclusions
Structural data can be used to predict Ras/RBD interactions with approximately 80% accuracy
We can and should use structural information to understand the main molecular properties before abstracting away the atomic details. Structural genomics can serve as a bridge between the abstract network view and the atomic detail.

The Making off
Although I am not the first author of the article I think it is safe to say that the main inspiration for the line of work done by Kiel (see also previous publication) is the work by Aloy and Russell where they first showed that it was possible to use a protein complex to predict if similar proteins would be able to interact in a similar way. What Kiel showed is that more accurate predictions can be made by modeling the protein domains under test onto the complex and evaluating the binding energy using a protein design program under development in the lab (FoldX). She used pull-down experiments and available information on Ras/RBD interactions to benchmark the predictions.

The predicted binding energies inform us about the probability that the two protein domains would bind in vitro. Inside the cell there are many other factors contributing to the likelihood of binding (gene expression, localization, complex formation, post-translational modifications, etc). To try to add some of this knowledge to the predictions I contributed with a Naive Bayes predictor that combines information on gene expression, GO functions, conserved physical/genetic interactions in other species and shared binding partners. The likelihood score obtained can be used to further rank the predicted interactions according to the likelihood that these are occurring inside the cell. In supplementary information there are the methods and tables with individual likelihood scores that can be used to reproduce the Naive Bayes predictor.

From atoms to nodes and edges
I think one of the main goals of the the review was to show the current progress that has been made in using structural information to obtain the fundamental properties (binding sites, catalytic sites, protein dynamics, etc) of cellular components that may allow us to create models of cellular functions. There has been some work in approximating the very abstract "nodes and edges" view of cellular interactions to a more traditional pathway model. This has been done typically by searching for modules and particular node roles that depend on the patterns of intra or inter module interactions (see Guimera et al). We should be able to automatically decorate interaction networks (and the pathway modules) with structural data that can further help to computationally generate meaningful models of cellular functions.
The picture was obtained from Beltrao et al , it is Copyright © 2007 Elsevier Ltd and it used here hopefully under fair use.

In the pipeline
There are several important details to iron out before we can just apply this structure based prediction of protein interactions to any protein that we can model onto complexes. We are in the process of testing the approach with other different domain types. Some of if I have been more directly involved and we started now the submission process. I tried to get everyone to agree to submit it to a preprint server but not everyone was comfortable with the idea.

Thursday, June 07, 2007

Tangled Bank #81 is know available

I participated with a submission to the latest edition of Tangled Bank (the first science carnival blog journal around) that is available at the Behavioral Ecology Blog. Thanks to RPM at Evolgen for "peer-reviewing" my post on protein evolution :).
Nature Precedings, a pre-print server for biomedical research

It was hard to hold off from blogging about this but I can finally write about Nature Precedings, a new free service provided by the Nature Publishing Group. The official announcement is in this editorial:
"... this site will enable researchers to share, discuss and cite their early findings. It provides a lightly moderated and relatively informal channel for scientists to disseminate information, especially recent experimental results and emerging conclusions."
"...the site will host a wide range of research documents, including preprints, unpublished manuscripts, white papers, technical papers, supplementary findings, posters and presentations."


I have been participating in the beta for some months now and as it is mentioned in the editorial it will be openly available starting next week. All documents are citable (have DOIs), are not peer-reviewed (in the formal sense) and are archived under a creative commons license (derivatives allowed). The site has the community features (tagging/commenting/rating/RSS feeds) that you would expect and that will hopefully allow for requesting and providing comments on early findings. In summary an nicer version of ArXive for biomedical research.

I think this is great news that serves on one hand to improve access to research (open access by pre-print archiving) and increase the openness of research. This can provide a place for independent time-stamping of early findings and could be improved (hopefully with community feedback) until it is appropriate for formal submission to a peer-reviewed journal.

A framework for open science (in biology) can now go from blogs/wikis to pre-print server to peer-reviewed journals. Many ideas might die along the way and many collaborations might form by connecting early findings in an unexpected way.

Of course if you are in maths/physics you have arXive and you are probably wondering what is taking us biomedical researchers so long to get into this.

Friday, June 01, 2007

Bio::Blogs# 11

The 11th edition of Bio::Blogs, is online at Nodalpoint. We tried to do something different this time. Michael Barton volunteered to host a special section dedicated to tips and tricks for bioinformatics that is hosted separately in Bioinformatics Zen. Because there were so many posts this month about personalized medicine there is also a special section on that.

There are three separate PDFs for this edition: 1) the main PDF can be found here; 2) The one on personalized medicine can be downloaded here; the one for tips and tricks available from Bioinformatics Zen. Michael did a great job with this special section, with a very cool design.