Cellular Consequences of Genetic variation

Saturday, September 29, 2007

The ephemeral journal II

(via Deepak) Earlier this month I posted about how re-grouping of content after publication could be used to foster the creation of more focused online scientific communities. My impression is that these "places" could more easily attract a group of people of similar interests that would more likely engage in discussions, in contrast to a place like PLoS ONE that covers way to many topics.

There are several names for these groupings (Nature/BMC gateways, Nature Reports, a topics page) and PLoS came out with another one - Hub. They launched a re-grouping of content focused on Clinical Trials that they call PLoS Hub for Clinical Trials. It is built on Topaz so it has everything that PLoS ONE has (comments, ratings, trackbacks,etc).

They mention in the home page of this Hub that they plan to in the future also "feature open-access articles from other journals plus user-generated content". I suspect that they could go even a bit further on this and give more control to the users for the creation of content for the Hubs and even to create new Hubs. One thing that I like in traditional journals that also creates a feeling of identity and community is the more personal news and views and editorials. PLoS could commission/invite scientists/bloggers to help create this type of content for their Hubs. This would be something like a community blog centered on this Hubs' research.

Once upon a time (before Digg if I remember right), we tried to do this in Nodalpoint. For a while we had a queue from bioinformatic related journals that we could vote on to upgrade it to the front page of the blog. At the time it did not work very well because of lack of users and participation but it in essence it was not very different from what the publishers are trying to do now. Maybe we could try it again :).

Wednesday, September 19, 2007

Vote for your favorite life science blogs

(via Science Hacker and Postgenomic) The Scientist wants to compile a list of life science blogs that people enjoy reading as a reference. It is really not a good question to ask since there are so many different fields and styles of writing.

Tuesday, September 18, 2007

More on open science

I am still catching up with a backlog of feeds and e-tocs but I just noticed that Benjamin Good posted his manuscript on E.D. in Nature Precedings. I wend back to his post where he first presented the manuscript to have a look at the comments and there is a nice discussion going on there. It is a good example of the usefulness of posting our work online. There might be still few people knowledgeable about particular interests to gain very good feedback in all areas but this will tend to grow with time.

Michael Barton from Bioinformatics Zen started a new blog to use as an open science notebook about his own research.

I have a mini project in mind about the evolution of domain families that I will start describing and working on here in the blog soon.

Sunday, September 09, 2007

The biology of modular domains (day1 and morning of day2)

I am attending the 3rd (I think it is just the third) conference on modular protein domains. It is a small conference of just 80 people with a very nice environment for discussions. Given the nature of the conference I suspect that a lot of the talks will be about unpublished material so I will be light on the details since I have not personally asked people if I may post about their work.

In the first day of the conference on modular protein domains we had the opening lecture by Wendell Lim. It was a very light and interesting discussion of the evolution and engineering of signaling pathways. Lim started by discussing some interesting results coming from the sequencing of M. brevicollis, a unicellular choanoflagellate that is related to Metazoa and might provide some information about their evolution. It is a continuation of an analysis done by Nicole King and Sean B. Carroll that first identified a receptor tyrosine kinase in M. brevicollis, the first time one was identified outside of the Metazoa. The discussion was generally about the evolution of kinase signaling and how such a system of what Lim was naming "readers"~phospho-binding domains, "writers"~kinases and "erasers"~phosphotases can arise in evolution.
The second part of his talk was about the efforts to understand the evolutionary capacity of signaling networks by trying to engineer new or altered pathways. In this case the focus was on how with few components and small changes in these components it is possible to shape the dynamic responses of signaling networks.

Morning session of the second day

Synthetic biology
The Synthetic Biology sessions started off with a talk by David Searls on "A linguistic view of modularity in macromolecules and networks" (that was not very related to synthetic biology but nevertheless interesting). Searls detailed his views on the analogies between linguistics and biology. Here is a recent review by Mario Gimona on this analogy. At the protein level we could think of sequence, structure, function and protein role as similar to lexical, syntatic, semantic and pragmatic levels of linguistic analysis:

(Image reproduced with permission)

The general idea of building these bridges over topics is to be able to take existing methods and discussions from one side to the other (see review).

The second talk was by Kalle Saksela and again it had little to do with synthetic biology. Saksela's group is working on high-throughput interaction mapping for human SH3 domains against full proteins (human and viral proteins). They mentioned their progress in expressing and analyzing a subset of these interactions. He mentioned an interesting example were the Nck and Eps8L1 SH3 domain binding site in CD3epsilon overlaped with an ITAM motif such that the phosphorylation of the ITAM motif abolished binding by the SH3 domains. It is a nice example of signaling mediated by different types of peptide binding domains (see paper for details).

The third talk was by Rudolf Volkmer. He gave a short talk on a library of coiled coil proteins. The library contains many single mutant variants of the GCN4 leucine-zipper sequence. They then tested pairs mutants for heterodimerization by SPOT assays. Aside from a extending the knowledge of these domain family the library can also be used know as a toolkit of binding domains for synthetic biology (the work is already published).

The final talk on this panel was from Samantha Sutton from the Drew Endy lab. This was more like what one would expect from a synthetic biology talk . Samantha Sutton is interested in developing what she calls Post Translational Devices, general abstract devices that can regulate the post translational state of proteins in a predictable fashion. She has a page in OpenWetWare detailing her thoughts on this.

The second panel in the morning was about In silico computational methods.
Cesareni presented their ongoing efforts to experimentally determine human SH3 and SH2 interactions with spotted peptides. He then showed how this data can be used to search for examples where there is overlapping recognition by different domain types. The work is similar in methodology to the paper published by Christiane Landgraf and colleagues in PLoS Biology but know using two domain families and the human proteome.

Vernon Alvarez from AxCell Biosciences, gave a talk about a proprietary database called ProChart (that I cannot find online) containing many domain-peptide interactions tested by the company. He was basically promoting the database for anyone interested in collaborations.

The third talk was by Norman Davey author of SLIMDisc a linear motif discovery method. He is trying to improve their method, mostly by improving the statistics.

I gave the second short talk of the session. It was on predicting binding specificity of peptide binding domains using structural information. It is basically a continuation of some of the work I mentioned before here in the blog about the use of structures in systems biology but know applied to domain-peptide interactions.

Saturday, September 08, 2007

The Biology of Modular Protein Domains

From tomorrow on I will be in Austria for a small conference on the biology of protein domains. I might post some short notes about the meeting in the next few days. I'll get a chance to present some of the things I have been working on about the prediction of domain-peptide interactions from structural data.

Here is one of these modular protein domains, an SH3 domain, in complex with a peptide:

The very short summary of it is that it is possible to take the structure of one of these domains in complex with a peptide (ex: SH3, phospho binding domains, kinases, etc) and predict their binding specificity. To some extent it is also possible to take a sequence, obtain a model (depends on structural coverage) and determine its specificity. I'll talk more about the details (hopefully) soon.

Tuesday, September 04, 2007

Scifoo Lives On: Definitions in Open Science

I am having a quick look at the session Definition in Open Science, going on in Second Nature (I'm Duriel Akula in Second Life). The place looks very different from the first time I had a look around the island. It is full of posters and other interesting material. Here is a picture as some of the first people started gathering:

Live coverage of the event by Berci (also in the picture).

Wednesday, August 29, 2007

Bio::Blogs #14 - Update

The 14th edition of Bio::Blogs will be hosted by Ricardo at My Biotech Life. It will be made available on the 1st of September and submissions can be sent by email as mentioned in his blog post.

Update: The 14th edition is now posted at My Biotech Life. With all the deadlines I had this past month I left it almost until the end to organize a host. Thanks again to everyone that contributed on such short notice.

Is anyone interested in serving as host for the October edition ?

SciVee.tv background info

A while ago SciVee was announced via several blog posts. Here is a link to the first one I read by Deepak and a link to the cluster in Postgenomic.

I thought at first glance that this was a partnership between some small start-up and a content provider (PLoS). After browsing a couple of the videos I noticed that most are from papers authored by Philip E. Bourne. Given the connections to both PLoS and SDSC (two of the site's partners) I thought that this might be an academic effort after all.
A couple of searches tells us that abailey was responsible for a Scivee mailing list at SDCS that no longer exists. abailey apparently stands for Apryl Bailey, someone involved in the SDSC CI Channel, a "webcast video service and resource for the scientific communities" (from their about page).

Apryl Bailey also appears listed in the Scivee Team in one of the slides of a talk (PDF) that Philip Bourne gave in June this year. According to this recent news story it looks like the launch was actually premature and triggered by this talk:
"According to one founder, Philip Bourne of the University of California–San Diego (UCSD) and founding editor in chief of PLoS Computational Biology, he talked about the project at a scientific meeting and the buzz began prematurely."

It is an academic effort, probably related to this CI Channel mentioned above:
"The project began with some pilot pubcasts done at UCSD to test video formats and has involved the other PLoS editors. There are currently eight people on the SciVee team. The SDSC is providing the site hosting."

From one of the slides of the talk:
Developmental Phases
• Phase I (One Year) – Invite authors of papers published in PLoS journals to upload a video or podcast to SciVee.tv describing the motivation, key results and major conclusions of the published study. Establish linkage between literature and video – source of metadata etc. – September 2007

• Phase II (Years 2- 3) - Scrape PubMed on a daily basis and extend the invitation to authors of all papers in the life sciences; develop video authoring server; provide
ratings and virtual community comment

• Phase III (Year 4- ) - Extend to other scientific disciplines

Saturday, August 11, 2007

The ephemeral journal

Recently I mentioned the start of yet another journal covering one of the topics I would place on the top of a hype cycle curve. This together with the apparent ever increasing number of journals everywhere got me thinking of birth/death of science journals. The cost of starting up a new journal is so low that the turn-over can only be higher. Still, we don't typically see a lot of "journal death". They are meant to be respected and built up reputation among the public audience they serve.
It looks however inevitable that with a limited attention capacity and ever increasing number of journals that science hype cycles might have a strong influence on a journals activities. If hyped up subjects sprout out new journals quickly (i.e stem cells, systems biology, synthetic biology), underperforming science memes will suffer from lack of attention. If I had a biomedical related science publishing house I would probably be thinking of launching a journal to cover metagenomics and another to cover personalized medicine.

Creating and destroying journals based on hype cycles sounds a bit exaggerated but at least there is no reason to think that a journal is here to stay. This can also happen via in a more subtle way, trough re-grouping of content after publication. Call it a gateway, a report, a topics page,a portal (harder to find), the idea is there are several ways one can group published papers to serve a target audience. Digital works are not things, they can be in several places and we can slice and dice the views as we wish. One great thing about these views is that they are more likely to attract discussion since there is more likely a group of people around with similar interests. This would be even more so if the users had some power to control the content. Nature Reports allow users to submit papers and to vote on them but it is still too soon to tell if discussions in topic pages are more frequent than on a site like PLoS ONE.

Instead of subscribing to the high impact journals, and lower impact journals of our topics of interest, we would state our interests in the views/portals/gateways we select to participate in and hopefully the works would be distributed to target audiences as fitting. Things that are of very high perceived impact would be cross-posted to many more views than more specific works. The value could still be perceived either pre or post publication.

The main advantage for the publisher is many more pages with well targeted audiences. Some of these views could even be of interest to a very wide non scientific audience. All of these should improve advertisement revenue.

Quotes

Another interesting SciView interview is available at Blind.Scientist. Here is one quote from Alexei Drummond (Chief Scientist of Biomatters) that I liked:

"I think that bioinformatics has to become a field where people without programming skills can contribute substantially. I would argue that all of the programmers in bioinformatics should be working very hard to program themselves out of their jobs (and into more satisfying jobs)."

Science advances quickly and so do the computational needs. Can we ever do away with these one off scripts if there are always new data types and innovative ways of analyzing them ? I guess the ideas around workflows and such could lead to very visual oriented programing that anyone can do.

Thursday, August 09, 2007

First issue of IET Synthetic Biology

The first issue of (yet) another journal related to systems&synthetic biology is now online. IET Synthetic Biology will be freely available during this year. This issue covers several works from iGEM and the editorial is worth a read to have a look at the future direction of the journal.

In addition to conventional research and review articles, we see an important need for practical articles describing technical advances and innovative methods useful in synthetic biology. We will encourage submission of technical articles that might describe novel BioBrick components, construction techniques, characterisation of a new biological circuit, new software or a practical ‘hands-on’ guide to the construction of new instrumentation or a biological device.

In addition to the print journal, we are developing associated web resources. These will include a repository of online video resources, specialised review material and research tools for synthetic biology.

Some journals tracking similar fields:
Molecular Systems Biology
BMC Systems Biology
Systems and Synthetic Biology
HSFP Journal
IET Systems Biology

Tuesday, August 07, 2007

~~Two~~Three new bioinformatic related blogs

A quick post to link out to two new bioinformatic related blogs:

Freelancing science (by Paweł Szczęsny)
Open.nfo (by Keith)

I will be happy the day there are too many to track :).

Updated: It could the official month of "start your own bioinformatics blog". The bio.struct blog is the third one so far.

Saturday, August 04, 2007

SciFoo starts ...

and I am not there :). No fun ! The Science Foo Camp 2007 has started at Googleplex and there is already some blog coverage. To have a look at what is going on at camp here is a tip from Andrew Walkingshaw:

* http://www.lexical.org.uk/planetscifoo/ - participants’ blogs
* http://flickr.com/photos/tags/scifoo/ - photos
* http://www.technorati.com/tags/scifoo/ - general blogosphere commentary

There is also some live Twitter feeds from Deepak and Nat Torkington.

To start off go have a look at pictures posted by Bora, you might recognize one or two of these bloggers.

Maybe next year we can try to organize a Science Barcamp :) Why should they have all the fun.

Friday, August 03, 2007

Bio::Blogs#13

A great edition of the monthly Bio::Blogs is up at Neil's blog. This month there are plenty of tutorials and a round up of blog coverage about the ISMB/ECCB 2007 conference.

PDF version for offline reading of the editorial and highlighted posts is here and here (Box.net copy).

If someone wants to give it a try at editing future editions of Bio::Blogs let me know.

Speaking of community projects, the list of webservers published in that last NAR webserver edition are in this Nodalpoint wiki webpage. If you try one of these services spend a minute noting down if it was even available, if it worked well, etc.

Wednesday, August 01, 2007

Microattribution

(via Peter Suber) An editorial in Nature Genetics discusses the need to establish microatribution systems:
"When requiring authors to deposit data in public databases, journals, databases and funders should ensure that quantitative credit for the use of every data entry will accrue to the relevant members of the data-producing and annotating teams. In an era in which consortia are producing more (and more useful) papers than individuals and small groups, the careers of individuals are as much in need of specific credit as those of the scientific visionaries and wranglers who hold the consortia together."

This sounds great. From the journals point of view this would mean "encouraging" the authors to link to all resources used. This information would then need to be aggregated and made available to everyone. This and other measures would help to change the current credit system that tends to reward researchers for producing papers in high impact factor journals (that does not correlate with individual paper citations) instead of rewarding scientists for the usefulness of their research.

Sunday, July 29, 2007

Bio::Blogs #13 call for submissions

Neil has kindly agreed to host the next edition of Bio::Blogs, due out on the first of August. Send in links to blog posts of bioinformatics/chemioinformatics/omics/open science related content to bioblogs at gmail and they will be re-directed to him.

Friday, July 27, 2007

Trade books vs Nature publishing

(via Richard Charkin blog) Richard Charkin is the Chief Executive of Macmillan (Nature Publishing is a subsidiary of Macmillan). He posted his thoughts on digital books are not as successful as the digital publishing going on at Nature.

I can't help noticing the second reason (my emphasis):

2. Scientific publishing has been intrinsically more profitable than trade book publishing. This allowed the major publishers and societies to invest the significant sums needed to create electronic delivery and storage platforms for scientific information. These platforms are a cornerstone for the creation of a new business and communication model.

and read it as "higher profit margins".

Google code for educators

(via the Google Blog) Google started a website to gather teaching materials for CS educators, covering some of the most recent technologies. Right now it has some material for AJAX Programming, Distributed Systems and Web Security. There are some video lectures and presentations. There is already some material on parallel programming (mostly related to their MapReduce) that should be of use to bioinformatics.

One a related topic Tiago has on his blog started a multipart series about "Bioinformatics, multi-core CPUs and grid computing". The first and second part are already available.

Tuesday, July 24, 2007

Slideshare adds voice

(via TechCrunch) Slideshare, a site to share presentations online has added voice synchronization. We can now provide a link to an mp3 file and Slideshare provides with some tools to sync the audio to the slides, such that each slide is linked to part of the audio track. More information and examples can be found in this FAQ page.

In related news, Bioscreencast has now a group in Facebook.

Saturday, July 14, 2007

Another Open lab book

(Via Open Reading Frame) Jeremiah Faith is given open notebook science a try and compiling some tips. He joins Rosie Redfield (microbiology) and Jean-Claude Bradley (chemistry) in exposing most of their research online and leading the way to changing the mindset towards open science.

Jeremiah Faith also has an interesting idea about using conference money to pay for advertisement. He figures that well targeted ads can get you more attention than a talk. He like the idea because it is thinking out of box but I think that the type of connection that one can create on a conference with other people is not so easy to recreate online. Also, there might not be any need to spend money on advertisement if the blogs keeps on topic and is interesting enough to get incoming links. The blog can be a good personal marketing tool.

Friday, July 13, 2007

Early response to PLoS ratings

PLoS ONE pushed out a rating systems in the latest update of their website. I though it was another quiet update but several announcements are now up.

The technical details were described by Richard Cave and Chris Surridge invites everyone to "Rate Early, Rate Often". Bora (that now works for PLoS) summed it up in a blog post as well.

And just because they make it so easy to query the data, here goes the stats 3 days after the announcement:
Number of papers queried: 611
Number of papers rated: 47
Number of ratings: 50
Ratings: Average - 75%; Max - 100%; Min - 40%

Top rated papers (all with 100%)
10.1371/journal.pone.0000288 (rated by: brembs)
10.1371/journal.pone.0000354 (rated by: brembs)
10.1371/journal.pone.0000439 (rated by: brembs)
10.1371/journal.pone.0000349 (rated by: Complexity_Group)
10.1371/journal.pone.0000351 (rated by: crusio)
10.1371/journal.pone.0000455 (rated by: crusio)
10.1371/journal.pone.0000123 (rated by: Damien)
10.1371/journal.pone.0000224 (rated by: Damien)

Maybe in the long run it would be nice to know if the user that rated is also an author in the paper :), or put a comment in the ratings suggesting that authors are not very good at evaluating their own work.

Number of users that have rated: 24
Top 3 users:
Chris_Surridge
Complexity_Group
jstajich

The lowest rating so far:
10.1371/journal.pone.0000257 (rated by:godzikc)

There is no point in trying to conclude anything from this. It was just for the fun of it. If I could make a small wish it would be have a similar way to query for the accumulated number of page views or visitors for a given DOI.

Wednesday, July 11, 2007

Open-source architecture to house the world

Here is a very energetic talk (filmed in February 2006) by Cameron Sinclair hosted at TED talks. He is part of the Architecture for Humanity organization that promotes architectural and design solutions to global, social and humanitarian crises. A very inspiring example of how internet really makes the world small and how ideas like crowdsourcing and the open access to innovation can make a difference. The first time I heard about a creative commons house design.

They have started a project called Open Architecture Network to serve as hub for collaborative efforts.

Tuesday, July 10, 2007

What is the $value$ of an editorial decision ?

(warning: random thoughts ahead)

From my viewpoint open access is doing great. PLoS has demonstrated that authors want to publish in open access journals and that these journals can quickly establish themselves as high impact forums for their respective audiences. BMC is set to show that open access can be profitable and within BMC some journals are are also trying to position themselves in the top tier of perceived impact.

How will BMC manage this and will PLoS and others find a way to serve the authors interest while keeping the direct costs to the authors within reasonable ranges (even if they are paid by the funding bodies) ? I can't really answer this :) but I do note a trend. Open access publishers like PLoS and BMC are increasingly publishing more and decreasing the rejection rates (when considering all that is published within the brand).

BMC has primarily focused on publishing high volume (peer-reviewed) articles without regarding to much on perceived impact in the field. I might be incorrect but more recently they have been trying to highlight a group of flagship journals (BMC Biology, Genome Biology and Journal of Biology) where they filter on perceived impact. They have even said that papers submitted to other BMC journals can even be suggested "up" if they are found to be of high impact.

PLoS on the other hand had the the exact opposite direction. PLoS started with their flagship journals (PLoS Biology and later PLoS Medicine), then created the community journals (PLoS Genetics, Computational Biology and Pathogens) and now opened PLoS ONE that will not filter on perceived impact.

On an author pays model, the most obvious way to limit the cost per paper and still provide a solid evaluation of perceived impact, is to have journals that cover the broad spectrum of perceived impact. In this way, for the publisher, the overall rejection rates decrease, the papers are evaluated and directed to the appropriate "level" of perceived impact.

Also, on closed publishers it is custom to be able to transition a manuscript with the peer-review comments from one journal to another of the same publisher. This practice is can be advantageous to everyone. saving the time of the another peer-review process.

Taking away the costs of editing and printing (online this can be very small) most of the costs of sustaining a science journal should mainly come from the editorial staff. So, what is the value of an editorial decision ? In other words, could there be freelance editors ? Could the editors be separated from the publisher ? Imagine I read a paper from a pre-print server, ask some people to peer-review (why would they?) and sell our evaluation to a journal.

Also, can a publisher sell the editorial decision to another publisher ? Lets imagine a journal that has a very high rejection rate, the editor asks referees for comments but ultimately the manuscript is rejected. The editor could then ask the authors where they want to send it next and offer to provide the referee report and editorial comments directly to the next journal to expedite the process. Could this journal get paid for this ?

Monday, July 09, 2007

User ratings in PLoS ONE

Another quiet update on the PLoS ONE interface. They have introduced an interface for user ratings. The overall rating can be seen in the right bar (when reading a paper on the site) and expanded to show a dissection into 3 categories: insight, reliability and style.

A click pops up a voting screen:

The nice detail is that we can query rating data by DOI. (example). It is not really an API, but the info is there and it is easily parsable. The PLoS ONE managing director, Chris Surridge, mentioned in the PLoS Facebook page, a couple of days ago that this change would be up soon.

Filtering papers on number of downloads

I was having a look at highly accessed papers for BMC Bioinformatics. In BMC, all journals have a page with the statistics of the most highly accessed papers of the last month. Several other journals now provide a similar service. The cool think about BMC is that they even tell you how many views per paper (sum of abstract, full text and PDF accesses on BioMed Central in the last 30 days). Not only that, the information in on the RSS feed they provide. That makes it very easy to feed into a pipe and have a threshold for number of views above which it will show up on the filtered feed.

Here is pipe example to filter out BMC Bioinformatic papers below 1000 views. The only problem is that the information is not stored as a number (example :"Number of accesses: 1226"). That is why I used a regular expression [1-9][0-9][0-9][0-9]$ instead of number filtering. I also don't know if the numbers are updated everyday .. but I hope so.

Even better would be to have some kind of service that given a DOI BMC would provide exactly this information structure. If other repositories provide a similar service then there is no point in worrying about the dilution in the number of page views because of open access because we could just sum views in the publishers site with Pubmed Central, etc.

Metadata infrastructure

Deepak and Neil blogged today about tagging and adding more structured metadata to the science web. I started by commenting to Deepak's post but it grew a bit so I changed it to a blog post.

The most obvious start for me would be to find a standard to communicate information on the perceived impact of a paper (extending hReview for example). It has a unique digital identifier and ways to resolve it but no way to communicate number of downloads at publisher site X, number of incoming citations in other papers, in blog posts, simple rating by users.

On the user side the blogging platforms, social network sites and wikis would need some way to add microformat support. See for example this plugin for wordpress (via F&L). If someone knows how to do the same for blogger please tell me in the comments. It needs to be something like clicking a button to link to a paper and out comes a formated hReview.

I think finding standards for manuscripts is a good start because a lot of people already tag and blog about papers. There is a lot of information to aggregate and a lot of interest in having a good measure of impact for individual papers. What we learn from putting this in place can later be used for other types of data communication (e-lab books). Another possible good start would be conferences and conference reports (related to hCalendar ?).

Of course, this would require the participation of science publishers. They are the ones best in place to set up the tools and expose some of the information in a structured way to help enforce a standard.

Saturday, July 07, 2007

Referee reports in Nature Precedings ?

I was having a look at some of the bioinformatics manuscripts available in Precedings and I come upon this paper on "The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies". After the figures there is a letter to the editor with the response to the questions from the referees.

I could not find the paper published in a peer-reviewed journal and I wonder if this was intentional of maybe part of an (opt-in and maybe buggy) automatic procedure from Nature to have submitted papers appear in Precedings. If I was an editor of a bioinformatics/genomics journal I could now consider if this paper with these referee reports would be interesting to the journal and send an email to authors suggesting that if by some chance their paper gets reject my journal would be willing to publish it.

Deepak was recently saying that it would be good to have access to this type of information. Why was a paper rejected from some journal and published in another? Most manuscripts go through several editorial and referee evaluations before getting published. Biology Direct and now PLoS ONE (too some extent) capture this information. I have found that many times it is useful to read the referee comments in Biology Direct because it provides with several independent criticism that makes it easier to home in on good and bad parts of the work.

I am sure that this has come up before in the context of ArXive but wouldn't it be more efficient to have journal editors somehow fish out from a common pool what is more interesting and hight impact to their community instead of the current submission ladder that i assume a lot of people go through ? We would submit to a preprint server and tag the paper according to perceived audience (i.e cell biology, bioinformatics, etc). Editors would flag their interest for the paper and the authors would select one of the journals. You can imagine some of the dynamics that this could create with some journals only looking at manuscripts that have already been flagged by some other journals, etc.

The referee reports would be attached the paper and the editor would make a decision. If rejected the paper would be up again for editorial selection but with the previous information attached. Other journals could just decide to publish with those referee comments.

I think this is not far from what already happens within publishing houses. Referee reports can be passed around to other journals of the same publisher. This would make it more general. Although there are clear advantages to authors (fewer rounds of refereeing and quicker publishing), it would be hard to convince most publishers to such a scheme. For those publishing mostly journals with low rejection rates it would be beneficial since most likely the papers have been already refereed, but for those with high rejection rates it could feel like they would be giving away their work for free. Since it is really the work of the referees maybe it should be up to the referees to decide if the reports can be made public or not, period.

Wednesday, July 04, 2007

RSS feed for BiomedCentral comments

As if there is not enough things to read this days I though it would be interesting to provide an RSS feed for BiomedCentral comments. I tried to use openkapow to scrape the information from the webpage but for some reason the feed only worked a couple of times after being published. Instead I used Dapper that amazingly enough produced a more stable feed. The full, unfiltered feed can be found here.

The feed includes the title (with a URL to the comment page, where there is a DOI to the cited paper), the short description provided in the main webpage and the journal (saved in the date). The feed can be filtered for particular journals using this simple pipe from Yahoo pipes that is currently set for BMC Genomics or BMC Bioinformatics.

Sunday, July 01, 2007

BioBlogs #12 and a blogroll update

The 12th edition of Bio::Blogs is out in Nodalpoint. It has been one year of monthly posts (mostly) about bioinformatics. Is anyone interested in hosting the next edition ?

Also, I updated my blogroll to reflect more what I am currently reading. Most updates are in the bioinformatics part but there are a couple of additions in all of them.

Bioscreencast and Multimedia@Harvard

Deepak and Harijay have posted about Bioscreencast, a project they were involved with. It is a repository for science related screencast. A screencast is video capturing the output of your screen usually with some audio narration to explain to the viewer what
it is being shown. They hope it will be used by scientists to share knowledge on how to use science related computer tools.

On a related note Ricardo posted a nice review on multimedia sites for science. He linked to this amazing video of the inner working of the cell:

You can see the full (narrated) version of this movie along with other science media files in this Multimedia Production Site at Harvard.

Wednesday, June 27, 2007

Call for Bio::Blogs #12

I am collecting submissions for the 12th edition of Bio::Blogs. Send in links to blog posts you want to share from your blog or that you enjoyed reading in other blogs to bioblogs at gmail until the end of the month. The next edition will be up at Nodalpoint on the 1st of Jully.

Maybe it could be cool to try out a section on papers of the month as voted by everyone (Neil used to do this once in a while). Anyone interested in participating just has to send a link to a paper, published last month and related to bioinformatics, with a short paragraph explaining what is nice about the paper.

Mike over at Bioinformatics Zen is asking how to continue the Tips and Tricks section of Bio::Blogs. He has put up a wiki page on open science in Nodalpoint to collect information for a possible future edition of the special section.

Monday, June 25, 2007

Synthetic Biology 3.0

I am not attending the 3rd edition of the Synthetic Biology conference but there are several bloggers attending and reporting.

The Seven Stones
Nature Newsblog (part I and part II)
The ETC blog (intro and part I)

Thursday, June 21, 2007

Structures in Systems Biology (a double bill)

Once in a while I get to write about what I have been working on. The last time it was about the evolution of protein interaction networks. This time it is about two papers that I contributed too. A review about the use of structures in systems biology and an article about structure based prediction of Ras/RBD interactions. I am sorry to say that both require a subscription (pedrobeltrao *at* gmail).

Main conclusions
Structural data can be used to predict Ras/RBD interactions with approximately 80% accuracy
We can and should use structural information to understand the main molecular properties before abstracting away the atomic details. Structural genomics can serve as a bridge between the abstract network view and the atomic detail.

The Making off
Although I am not the first author of the article I think it is safe to say that the main inspiration for the line of work done by Kiel (see also previous publication) is the work by Aloy and Russell where they first showed that it was possible to use a protein complex to predict if similar proteins would be able to interact in a similar way. What Kiel showed is that more accurate predictions can be made by modeling the protein domains under test onto the complex and evaluating the binding energy using a protein design program under development in the lab (FoldX). She used pull-down experiments and available information on Ras/RBD interactions to benchmark the predictions.

The predicted binding energies inform us about the probability that the two protein domains would bind in vitro. Inside the cell there are many other factors contributing to the likelihood of binding (gene expression, localization, complex formation, post-translational modifications, etc). To try to add some of this knowledge to the predictions I contributed with a Naive Bayes predictor that combines information on gene expression, GO functions, conserved physical/genetic interactions in other species and shared binding partners. The likelihood score obtained can be used to further rank the predicted interactions according to the likelihood that these are occurring inside the cell. In supplementary information there are the methods and tables with individual likelihood scores that can be used to reproduce the Naive Bayes predictor.

From atoms to nodes and edges
I think one of the main goals of the the review was to show the current progress that has been made in using structural information to obtain the fundamental properties (binding sites, catalytic sites, protein dynamics, etc) of cellular components that may allow us to create models of cellular functions. There has been some work in approximating the very abstract "nodes and edges" view of cellular interactions to a more traditional pathway model. This has been done typically by searching for modules and particular node roles that depend on the patterns of intra or inter module interactions (see Guimera et al). We should be able to automatically decorate interaction networks (and the pathway modules) with structural data that can further help to computationally generate meaningful models of cellular functions.

The picture was obtained from Beltrao et al , it is Copyright © 2007 Elsevier Ltd and it used here hopefully under fair use.

In the pipeline
There are several important details to iron out before we can just apply this structure based prediction of protein interactions to any protein that we can model onto complexes. We are in the process of testing the approach with other different domain types. Some of if I have been more directly involved and we started now the submission process. I tried to get everyone to agree to submit it to a preprint server but not everyone was comfortable with the idea.

Thursday, June 07, 2007

Tangled Bank #81 is know available

I participated with a submission to the latest edition of Tangled Bank (the first science ~~carnival~~ blog journal around) that is available at the Behavioral Ecology Blog. Thanks to RPM at Evolgen for "peer-reviewing" my post on protein evolution :).

Nature Precedings, a pre-print server for biomedical research

It was hard to hold off from blogging about this but I can finally write about Nature Precedings, a new free service provided by the Nature Publishing Group. The official announcement is in this editorial:
"... this site will enable researchers to share, discuss and cite their early findings. It provides a lightly moderated and relatively informal channel for scientists to disseminate information, especially recent experimental results and emerging conclusions."
"...the site will host a wide range of research documents, including preprints, unpublished manuscripts, white papers, technical papers, supplementary findings, posters and presentations."

I have been participating in the beta for some months now and as it is mentioned in the editorial it will be openly available starting next week. All documents are citable (have DOIs), are not peer-reviewed (in the formal sense) and are archived under a creative commons license (derivatives allowed). The site has the community features (tagging/commenting/rating/RSS feeds) that you would expect and that will hopefully allow for requesting and providing comments on early findings. In summary an nicer version of ArXive for biomedical research.

I think this is great news that serves on one hand to improve access to research (open access by pre-print archiving) and increase the openness of research. This can provide a place for independent time-stamping of early findings and could be improved (hopefully with community feedback) until it is appropriate for formal submission to a peer-reviewed journal.

A framework for open science (in biology) can now go from blogs/wikis to pre-print server to peer-reviewed journals. Many ideas might die along the way and many collaborations might form by connecting early findings in an unexpected way.

Of course if you are in maths/physics you have arXive and you are probably wondering what is taking us biomedical researchers so long to get into this.

Friday, June 01, 2007

Bio::Blogs# 11

The 11th edition of Bio::Blogs, is online at Nodalpoint. We tried to do something different this time. Michael Barton volunteered to host a special section dedicated to tips and tricks for bioinformatics that is hosted separately in Bioinformatics Zen. Because there were so many posts this month about personalized medicine there is also a special section on that.

There are three separate PDFs for this edition: 1) the main PDF can be found here; 2) The one on personalized medicine can be downloaded here; the one for tips and tricks available from Bioinformatics Zen. Michael did a great job with this special section, with a very cool design.

Wednesday, May 30, 2007

Presenting Blog Citations

Recently Postgenomic hit the 10k mark. Ten thousand citations to papers and books have been tracked in science related blogs. In the post announcing the milestone, Euan asked if blog buzz could be an indication of impact of a paper. Can science bloggers help to highlight potentially interesting research ?

I decided to have a look at this and asked him to send in a list of papers published in 2003-2004 and mentioned in blog posts. For these I took from ISI Web of Science the number of citations in papers tracked by ISI (all years). There are 519 papers published in 170 journals in the period of 2003-2004 that were mentioned in blogs tracked by Postgenomic. Of these, 79 papers could not be found in ISI. Many of the papers not found in ISI were published in arXiv. These 79 were no longer considered for further analysis.

Top cited journals in blog posts

I ranked the journals according to the incoming blog citations. The top 5 are highlighted below, and apart from arXiv, that is not usually tracked as a journal (maybe it should), the other 4 are all known journals publishing in general science/biology. Comparing to impact factors there is a noted absence of review and medical journals. This measure of blog citations (instead of blog citations per article) will penalize low volume journals like the Annual Review series. Regarding the low blog impact of medical journals, maybe the current journal ranking by blog citations reflects a higher proportion of biology and physics blogs currently tracked by postgenomic.

Relation between blog citations and average literature citations

The fact the bloggers tend to cite research published in high-impact journals could be just due to the higher visibility of these journals. To test this, I analyzed the average citation per article from papers published in 2003-2004 in any journal with more than 1,2 and 3 blog citations (see table below). I compared it to papers published in Science and Nature in the same period. It is possible to conclude that: 1) papers mentioned in blogs have a higher average citation than those published in these high impact journals: 2) papers with increasing blog citations have on average a higher number of literature citations.

Journal	Papers in 2003-2004	Citations	Average citation per paper
Science	5306	148912	28.06
Nature	5193	145478	28.01
>0 blog citations	440	21306	48.42
>1 blog citations	71	3679	51.81
>2 blog citations	24	1835	76.45
>3 blog citations	15	1557	103.8

I did not remove non-citable items (editorials, news and view, letters, etc) from the analysis. It would hard to come up with criteria for removing these from both the journals and from the papers tracked by postgenomic. In any case, I suspect that bloggers tend to blog a lot about of non-citable items because these are usually more engaging for discussions than research papers. Therefore if anything I suspect that the real measure of impact for blog cited items should be even higher.

Our global distributed journal club

In recent years science publishers have worked to adjust to publishing online. Most of them now offer RSS feeds for their content and some timidly started allowing readers to comment on their sites. With the exception of BioMed Central none of the publishers make of point of prominently showing these comments, making it harder to find out about interesting ongoing discussions. This has not stopped researchers from participating on what can be called a global distributed journal club. As Euan and others have nicely noted, scientists are using blogs to discuss research. It is a very diffuse discussion but it can be aggregated in way that it could never be possible if we kept to ourselves, in the usual conferences or in our institutes/universities.

I tried to show here that this aggregated discussion conveys information regarding the potential impact of published research. This is only the tip of the iceberg of the potential benefits of aggregating and analyzing science blogs. For example, it should be possible to look for related papers from the linking patterns of science bloggers; the dynamics of communication between different science disciplines; the trends in technology development, etc.

Some publishers might be thinking of ways to reproduce these discussions in their sites. One alternative would be for science publishers to get together in the development of the aggregation technology. There should be an independent site gathering all the ongoing comments from blog posts and from the publishers' websites. This could then be used by anyone interested in the information. It could be shown next to a pubmed abstract or directly in the publishers website. Right now this would likely be the single biggest incentive to online science discussions that science publishers could do.

Next stop: San Francisco

I finally know for sure where I will be going for my first posdoc. I will be taking a joint posdoc position with Wendell Lim and Nevan Krogan at UCSF - Mission Bay. I will be moving to San Francisco around the end of the year or beginning of next year.

Tuesday, May 29, 2007

Reminder for Bio::Blogs#11

I will start collecting posts for the 11th edition of Bio::Blogs (monthly bioinformatics blog journal) to be hosted at Nodalpoint on the 1st of June. Anyone can participate by sending in submissions to bioblogs at gmail. This month there is going to be a special section dedicated to tips for computational biologists, that will be hosted at Bioinformatics Zen. Something like a separate insight issue :). To participate in the special section email Mike (see post) with your tips or write a post and submit the link to him. It can even be just a couple of sentences. Just think of things that you consider to be important for people working in computational biology and send it in.

Friday, May 25, 2007

The Human Microbiome Project approved

(via Jacques Ravel's blog) It looks like NIH approved a pilot study to have a look at human microbial populations. From the NIH roadmap :

On May 18, 2007, the IC Directors met to review and prioritize specific proposals developed by Working Groups of trans-NIH staff, led by IC Directors. Four topics were chosen to move forward as Major Roadmap Initiatives. Two of these, the Microbiome and Epigenetics Programs, were approved for immediate implementation as five year programs. (...)

* Microbiome – The goal of the proposed Human Microbiome Project is to characterize the microbial content of sites in the human body and examine whether changes in the microbiome can be related to disease.

Related posts from other blogs:
A human microbiome program? (Jonathan A. Eisen)
More on the Human Microbiome Program Workshop - Day1 (Jonathan A. Eisen)
A Human Microbiome Project? (MSB blog)

Thursday, May 24, 2007

Nature vs. Nurture in personalized medicine

Personalized medicine aims to determine the best therapy for an individual based on personal characteristics. Given that the family history is a risk factor for many diseases there is a strong motivation for the search of inheritable genetic variation that might provide molecular explanations for diseases. In the last couple years, improvements in sequencing technology have helped to scale up these efforts. The HapMap project is an example of these attempts at genome wide characterization of human genetic variation. The project aims to create a haplotype map of the human genome. This map is important because correlating a disease with a haplotype can be used to pin-point the cause of a disease to a genome region. This map based approach is done by first sequencing known sites of polymorphisms, spaced across the genome, in a large population and then associating disease with haplotypes (see a recent example).

Eventually sequencing costs will go down to a point when these map based approaches are replaced by full genome re-sequencing. It looks like there is a consensus that this is just a matter of time. Also, the main sequencing centers seem to be directing more of their efforts to studying variation. If sequencing full genomes is currently too expensive, sequencing coding regions is much more affordable. In two recent papers (Greenman et al. and Sjoblom et al.) researchers have tried to identify somatic mutations in human cancer genomes by sequencing. Greenman and colleagues focused on 518 kinases and searched for mutations in these genes in 210 different human cancers (see post by Keith Robison). Sjoblom and colleagues on the other hand sequenced fewer cancer types (11 breast and 11 colorectal cancers) but did so for 13023 genes. The challenge going forward is to understand what is the impact of these mutations on cellular function.
Instead of sequencing to find new polymorphism is also possible to test the association of previously identified variation with disease by high-throughput profiling. Two recent papers focused on profiling known polymorphisms in cancer tissues using either microarrays or PCR plus mass spec.

Underlying all of these efforts is the idea of genetic determinism. That if I sequence my genome I should know how each variation impacts on my health and what treatment I should use to correct it. It begs the question however of much does it really depend on inherited genetic variation ? The often re-visited Nature vs. Nurture debate. The latests MSB paper highlights the impact of the environment on mammalian metabolic functions. Fracois-Pierre J Martin and colleagues have studied how the microbial gut population affects the mouse metabolism. They have used NMR metabolic profiling in conventional mice, and germ free mice colonized by human baby flora to study this question.

Metabolic analysis of liver, plasma, urine and ileal of both types of mice showed a significant change in metabolites in the different compartments associated with the two microbial populations. This is a very clear example of how the environment must be taken into consideration for future efforts of personalized medical care.

This example also underscores the importance of studying the human microbial associations. As Jonathan Eisen discussed in his blog, maybe we should aim at a human microbiome program.

Nature or Nurture ? In either case, abundant streams of data are forthcoming as the sequencing centers crunch away and new omics tools get directed at studying disease. There will be a lot of work to do in order to understand causal relationships and suggest therapeutic strategies. That might be why Google is taking a look at this. They keep saying they want to organize the worlds information, why not health related data.

The picture was taking from News and View by Ian Wilson:
Top-down versus bottom-up—rediscovering physiology via systems biology? Molecular Systems Biology 3:113

Tuesday, May 15, 2007

Protein evolution

What constrains and determines the rate of protein evolution ? This topic has received a great deal of attention in bioinformatics. Many reports have found significant correlations between protein evolutionary rate and expression levels, codon adaptation index (CAI), protein interactions (see below), protein length, protein dispensability and centrality in protein interactions networks. To complicate matters still, there are known cross correlations between some of the factors. For example it has been observed that the number of protein interactions correlates with protein length (weakly) and the probability that a protein is essential to the cell.

This highlights the importance of thinking about the amount of variance explained by the correlation and controlling for possible cross correlations. In fact it has been shown that, when controlling for gene expression, some of other factors have a weaker correlation (or none at all) with the rate of protein evolution (Csaba Pál et al 2003). Using principal component regression, Drummond and colleagues have shown that a single component dominated by expression, CAI and protein abundance accounted for 43% of the variance of the non-synonymous mutation rate (dN). The other known factors account only for a few percentage of the observed variance in dN.

Two questions might come to mind when thinking about these observations. One is why would expression values, CAI and protein abundance constrain protein evolution. The other is why the number of protein interactions explain so little (or non at all) of the variance in protein evolutionary rates. Intuitively, the number of protein interactions is related to the functional density of a protein and proteins with hight functional density should have a lower dN.

Drummond and colleagues proposed in a PNAS paper an explanation for the first question. They first list three possible reasons for why expression levels should have such a strong effect on protein evolution: functional loss, translational efficiency and translational robustness. Functional loss, postulated by Rocha and Danchin hypothesizes that highly expressed proteins have lower dN because they are under strong selection to minimize the impact of miss-translation that would create a large pool of inefficient proteins and reduce the fitness of the cells. A second hypothesis proposed by Akashi links protein evolutionary rates with gene expression through efficiency of transcription. Highly expressed proteins have optimal codon usage for efficient translation and therefore a lower dN and dS. Drummond and colleagues added a third hypothesis that they called translational robustness. Given the costs of miss-folding and agregation, the higher the number of errors in translation that might lead to miss-folding and agregation the higher the cost for the cell. Therefore there might by a strong selection for keeping highly expressed genes robust against miss-translation.

The difference between translational robustness and functional loss is that the first implies that the number of events of translation are the important factor while the second puts emphasis on the protein concentration. Using protein abundance and mRNA expression the authors showed that translational robustness seams to be the most important factor determining the rate of protein evolution.

In fact, in a recent paper (Tartaglia et al, 2007) a correlation between in vitro aggregation rates and in vivo expression levels was discovered. Highly expressed proteins tend to have a lower agregation rate measured in vitro (r=97, N=12). The number of proteins analyzed was small and the rates of agregation were obtained not always in the same conditions but it does fit with the translational robustness hypothesis.

Even if the number of translational events is such a strong constrain, one would expect that when accounting for this, one would still see an effect of functional density on protein evolution. Yet, the correlation between a proxy for functional density - number of protein interactions - and dN has been under strong debate. (yes there is, no there isn't, yes, no , yes, maybe, ...)

The answer to this dispute might in the end be that the number of protein interactions is not a good proxy for functional density. A protein might have many protein interactions using a single interface. This is why the work of Kim and colleagues from Gerstein lab is important. Using structural information they predicted the most likely interface for protein interactions in S. cerevisiae. They could then show that protein evolutionary rate correlates better with adjusted interface surface area than with number of protein interactions. Also, the relationship of evolutionary rate with protein evolution appears to be independent of protein expression level.

The overall picture so far seems to be that translational robustness is the main driving force shaping protein evolutionary rates. Functional constrains are also important but are much more localized explaining a smaller fraction of the overall variance of the whole proteins.

Where can we go further ? As I mentioned above, translational robustness predicts that expression levels should correlate with overall stability, designability (number of sequences that fit the structure) and avoidance of aggregation prone sequences. Bloom and colleagues have shown that density of inter-residue contacts(a proxy for designability) does not correlate with expression but the study was limited to roughly 200 proteins so this might no be the final answer.

So, a clear hypothesis is that a computational measure that would sum a proteins' stability, tendency for agregation and designability should correlate with gene expression levels.

Further reading:
An integrated view of protein evolution (Nature Reviews Genetics)

Friday, May 11, 2007

Science Foo Camp 2007 and other links

Nature is organizing another Science Foo Camp. There are already a couple of bloggers that have been invited (Jean-Claude Bradley, Pierre, PZ Myers, Peter MR, Andrew Walkingshaw). There is a "Going to Camp" group in Nature Network, and the scifoo tag in connotea to explore if you want to dig deeper.

I was there last year and I can only thank again Timo for inviting me and encourage everyone that has been invited to go. It was a chance to get to know fascinating people and hear about new ideas. In the off chance of any of the organizers is reading this ... please try to get together people from Freebase (or similar company) with the people involved in biological standards (like Nicolas Le Novère).

A quick hello to two new bioinformatic related blogs: Beta Science by Morgan Langille and Suicyte Notes.

(via Pierre, Neil and Nautilus) In a correspondence letter published by Nature, Mark Gerstein, Michael Seringhaus and Stanley Fields discuss the implementation of structured, machine readable abstracts. As I mentioned in a comment to Neil's post, this is one of those ideas that have been around, that most people would agree to but somehow it is never implemented. In this case it would have to start on the publisher's side. As we have seen with other small technical implementations, like RSS feeds, once a main publisher sets this up others will follow.

Monday, May 07, 2007

Introducing the Systems Biology department at CRG

I am spending two weeks in Barcelona to help out with a referee report. I can't really say yet what it is about but if everything goes well, maybe I will in a couple of months (hint: evolvability). What I can do is introduce the environment. I am in the 5th floor of the Barcelona Biomedical Research Park. The building is located in front of the sea and it hosts several different institutes. I am staying at the Center for Genomic Regulation (CRG) where my supervisor Luis Serrano is heading the program for Systems Biology. The program is a partnership between CRG and EMBL and it currently is home for four groups (Luis Serrano, James Sharpe, Mark Isalan and Ben Lehner).

The department has a lot of research in development and evolutionary systems biology. I have only been here a week but the environment is great and the beach in the background is a killer plus. Have a look around the webpage for the other programs.

Friday, May 04, 2007

Its official ;), scientists enter the blogoshere

Laura Bonetta wrote an analysis piece in Cell about scientists entering the blogosphere. Laura Bonetta (could not find her blog :) does a god job of introducing science blogging in a short and easy to read assay. There is a bit of everything: science education, discussions, carnivals and open science. The only thing that is sorely lacking is a mention of Postgenomic and maybe the publishers blogs.

Wednesday, May 02, 2007

Bio::Blogs #10

The 10th edition of Bio::Blogs, the bioinformatics blog journal has been posted at Nodalpoint. The PDF can be downloaded from Box.net.