Tuesday, April 25, 2006

Engineering a scientific culture

In a commentary in Cell, Gerald Rubin describes Janelia Farm, the new research campus of the Howard Hughes Medical Institute. If you cannot access the commentary, there is a lot of information available on the website such as this flash presentation (oozing with PR talk).

In summary (as I understood it) the objective is to create a collaborative working environment where scientist can explore risky and long term projects without having to worry about applying for grants and publishing on very regular basis.
Group leaders in Janelia Farm will
- have small groups (two to six)
- not be able to apply to outside funding
- still work in the bench

Unless you are really interested in managing resources and all the hassle of applying for grants, this sounds very appealing.

Also, there is no limit on the amount of time the group leader can stay at Janelia Farm, as long as they pass a review process every 5 years. This is unlike for example here at EMBL where most people are forced to move after 9 years (there is a review process after 5 years).

Since the main objectives of Janelia Farm is to work on long term projects that can have significant impact, the review process will not focus on publications but on more subjective criteria like:
"(1) the ability to define and the willingness to tackle difficult and important problems; (2) originality, creativity, and diligence in the pursuit of solutions to those problems; and (3) contributions to the overall intellectual life of the campus by offering constructive criticism, mentoring, technical advice, and in some cases, collaborations with their colleagues and visiting scientists"

Sounds like a researchers paradise :), do the science we will do the rest for you.
It will be interesting to see in some years if they manage to create such an environment. The lack of very objective criteria and no limit on the stay in the campus might lead to some corruption.

Friday, April 21, 2006

Posting data on your blog

From Postgenomic I read this blog post in Science and Politics on science blogs. Bora Zivkovic describes in his post the different types of science blogging with several examples. The most interesting part for me was his discussion of posting hypothesis and unpublished data. I was very happy to see that he already had some post with his own unpublished data and that the discussion about science communication online is coming up in different communities.

His answer to the scoop problem :
But, putting data on a blog is a fast way of getting the data out with a date/time stamp on it. It is a way to scoop the competition. Once the data are published in a real Journal, you can refer back to your blog post and, by doing that, establish your primacy.

There are some problems with this. For example, people hosting their blogs can try to forge the dates, so it would be best to have a third party time-stamping the data. Postgenomic would be great for this, there could be another section in the aggregator to track posts with data. Some journals will probably complain about prior publication and decline to publish something already seen in a blog.

The problems with current publishing systems and the agonizing feeling of seeing your hard work published by other people will probably help drive some change in science communication. Blogging data would make science communication more real-time and transparent, hopefully reducing the number of wasted resources and frustrations with overlapping research.

This is a topic I usually come back to once in while so I have mentioned this here before. The stream like format of the blog makes it hard to keep posting all the relevant links on the topic so I think from now on I will just link to the last post on the topic to at least form a connected chain.

Tuesday, April 11, 2006

Stable scientific databases

The explosion of scientific data coming from high throughput experimental methods has lead to the creation of several new databases for biological information (protein structures, genomes, metabolic networks and kinetic rates, expression data, protein interactions, etc). Given that funding is generally attributed for a limited time and for defined projects it is possible to obtain money to start a database project but it very difficult to obtain a stable source of funding to sustain a useful database. I mentioned this before more than once when talking about the funding problems of BIND.
In this issue of The Scientist there is a short white paper entitle "Save our Data!". It details the recommendations of The Plant Genome Database Working Group for the problems currently faced by the life science databases.

I emphasize here four point they make:
2. Develop a funding mechanism that would support biological databases for longer cycle times than under current mechanisms.
3. Foster curation as a career path.
6. Separate the technical infrastructure from the human infrastructure. Many automated computational tasks do not require specialized species- or clade-specific knowledge.
7. Standardize data formats and user interfaces.


The first and last points were also discussed a recent editorial in Nature Biotech.

What was a bit of a surprise for me is their 3rd point on fostering curation as career path. Is it really necessary to have professional curators ? I am a bit divided between a more conservative approach at data curation with a team of professional curators or a wisdom of the crowds type of approach were tools are given to the communities and they solve the curation problems. I think it would be more efficient to find ways to have the people producing the data, curating it automatically into the databases. To have this happen it has to be really easy and immediate to do. I still think that journals are the only ones capable of enforcing this process.

The 6th point they make is surely important even if the curation effort are to be pushed back to the people producing the data. It is important to make the process of curating the data as automatic and easy as possible.

Friday, April 07, 2006

Retracted scientific work still gets cited

Science has a news focus on scientific misconduct. A particular study tracked the citation of papers that were already retracted. They found that scientists keep citing retracted papers.
Some editors contacted by Science said that they do not have the resources to look up every citation in every paper to help purge the literature of citations to retracted work. In my opinion this is not such a complicated problem. If journals agreed to submit to a central repository all retractions, then the citations could very easily be checked against the database and removed. Even with such an automatic system , scientists should have the responsibility to keep up with the works being retracted in their fields.
Since retractions are publicly announced by the journals pubmed has already some of this information available. If you search for retraction in the title in pubmed you can see several of these announcements (not all are retractions). In some cases, when you search for a the title of a retracted paper you can see in pubmed a link to the retraction but this is not always the case. All that is needed is for publishing houses to agree on a single format to publish retractions and repositories to make sure all retractions are appended to the former entries to the same publication.

Tuesday, April 04, 2006

Viral marketing gone wrong

The social internet has emerged as an ideal ground for marketing. People enjoy spreading news and in the internet meme spreading sometimes resembles a viral infection propagating throughout the network.
Some companies like Google have made their success on this type of word-of-mouth marketing. If you can get a good fraction of the social internet to be attached to your products in such a way that they want to tell their friends all about it , you don't have to spend money in marketing campaigns.
The important point here is that a fraction of people must be engaged in the meme, they must find it so cool and interesting that they just have to go and tell their friends and infect them with the enthusiasm. How do you do this ? That's the hard part I guess.
So, the marketing geniuses of Chevrolet decided that they would try their hands at viral marketing. To get people engaged they decided to have the masses build the ads. We usually like what we build and we want to show it to our friends, so the idea actually does not sound so bad right ?! :) well , this would have been a fantastic marketing idea, if most people actually had good things to say about the product.

Here is an example of the videos coming out from the campaign:


I worried before that this type of marketing could be a negative consequence of science communication online but these examples just show that directing attention alone is not enough, people will judge what they find and are free to criticize.

Monday, April 03, 2006

The Human interactome project

Marc Vidal has a letter in The Scientist urging scientist and funding agencies to increase efforts to map all human protein interactions. He suggests that different labs work on different parts of the huge search space (around 22000^2 excluding splice variants) and of course that funding agencies give out more money to support the effort. He makes an interesting point when he compares funding for genome projects with interactome mapping. I also think that the interactome mapping should be view in the same way has genome sequencing and that the money invested would certainly result in significant progress in basic and medical research.
The only thing I would add to my own wish list is that some groups would start comparative projects at the same time. Even if it takes longer to complete the human interactome it would be much more informative to have of map of the ortholog proteins in a sufficiently close species to compare with (like mouse). Alternatively some funding could go specifically to comparative projects studying for example the interactomes of different yeasts (it is easy to guess that I would really really like to have this data for analysis :).


Friday, March 31, 2006

Get ready for the 1st of April storm

Tomorrow is April fools day and there is a long tradition in the media to put out jokes on this day. Some years ago this was, for me, almost not noticeable. I knew that the newscasts in the different TV channels would have at least one spoof story. Maybe I would notice the joke in one or two newspapers if I actually read one that day. These days I get almost everything from the internet and it is no longer just from a handful of sources, it comes from tons of media sites, blogs and aggregators. So every year that I am more connect I notice more the 1st of April as the day where everybody goes nuts on the web. This year it even starts early has you can see by this gold fish story in the economist. Maybe spishine's post on quitting blogging was also an example of early April fools ;).

Tuesday, March 21, 2006

Wiki-Science

From Postgenomic (now on Seed Media Group servers), I picked up this post with some speculations on the future of science. It is a bit long but interesting. It was written by the former editor of Wired magazine so it is naturally biased to speculations on technology changes.

My favorite prediction is what he called Wiki-Science:

"Wiki-Science - The average number of authors per paper continues to rise. With massive collaborations, the numbers will boom. Experiments involving thousands of investigators collaborating on a "paper" will commonplace. The paper is ongoing, and never finished. It becomes a trail of edits and experiments posted in real time - an ever evolving "document." Contributions are not assigned. Tools for tracking credit and contributions will be vital. Responsibilities for errors will be hard to pin down. Wiki-science will often be the first word on a new area. Some researchers will specialize in refining ideas first proposed by wiki-science."

I am trying to write a paper right now and just last week the thought crossed my mind of just doing it online in Nodalpoint's wiki pages and inviting some people to help/evaluate/review. However I am not sure that my boss would agree with the idea and honestly I am a bit afraid of losing the change of publishing this work as a first author. Maybe when I get this off my hands I'll try to start an open project on a particular example of network evolution.

Links on topic:
Nodalpoint - projects ; collaborative research post
Science 2.0
Looking back two years ago - M$ vs GOOG

I was reading a story today about the keynote lecture by Bill Gates on the Mix'06 conference and I remembered posting something on the blog when I first saw a story about Microsoft moving into the search market. This is one of the funny things about having the blog is that I can go back to what I was reading and thinking back some time in the past. So from the previous post I guess Microsoft started reacting to the rise of Google more than two years ago. Retrospectively it was really hard to predict the impact of web2.0 and free software/add model. Judging by Gates' speech , only now is Microsoft really completed turned into this direction so I guess it takes some time to turn such a big boat. They managed before (see browser wars) to turn the company into the internet era and maintain dominance, let's see how they keep up this time with Google, Yahoo, Amazon, etc.

Looking back on some of the post of that time I realize how I changed my blogging habits. In the beginning I used the blog more like a link repository with short comments while currently I tend to blog more about my opinion on a topic. I'll check again in some years from now if I don't quit in the meantime :).

Wednesday, March 08, 2006

Comparative Interactomics on the rise

I am sorry for the buzzwords but I just wanted to make the point of the exaggerated trend. Following the post on Notes from the Biomass I picked up the paper from Gandhi et al in Nature Genetics. The authors analyzed the human interactome from the Human Protein Reference Database, comparing it to other protein interaction networks from different species. Honestly I was a bit surprised to see so few new ideas on the paper and I agree with the post in Notes that they should have cited some previous works. For example the paper by Cesareni et al in FEBS Letters includes a similar analysis between S. cerevisiae and D. melanogaster. Also the people working on PathBlast have shown that maybe it is more informative to look for conserved sub-networks instead of the overlap between binary-interactions. I am personally very interested in network evolution and I was hoping the authors would elaborate a bit more on the subject. As usual they just dismiss the small overlap to low coverage. Is it so obvious that species that diverged 900My to 1By ago should have such similar networks ?

Like it was the case with comparative genomics, the ability to compare cellular interaction networks of different species should be far more informative than looking at individual maps. Unfortunately it is still not so easy to map a cellular interaction network has it is to get a genome.

Just out of curiosity, I think the first time the buzz words "comparative interactomics" were used in a journal was in a news and views by Uetz and Pankratz in 2004. Since then I think two papers picked up on the term, as you can see in this pubmed search (might change with time).

Monday, March 06, 2006

Marketing and science

I just spent 48 minutes seeing this video where Seth Godin spoke to Google about marketing. He talks a lot about how it is important to put out products that have a story, that compels people to go and tell their friends. This type of networking marketing is usually referred to as viral marketing (related to memetics). It is a really nice talk (he knows how to sell his ideas :) and it got me thinking of marketing in science.

The number of journals and publications keep growing at a fast pace. Out of curiosity I took from pubmed the number of publications published every year for the past decade and we can clearly see that the trend for the near future is, if anything, for further acceleration in the rate of publication.

The other important point is that internet is probably changing the impact that an individual paper might have, irrespective of where it is published. It is easier with internet, for word-of-mouth (meaning emails, blogs, forums,etc) to raise awareness to good or controversial work than before.
So what I am getting at is that, on one hand the internet will likely help individual publications to get their deserved attention but on the other hand it will increase the importance of marketing in science. Before, to have attention, your work needed to be published in journals that were available in the libraries, now and I suspect, increasingly so in the future, you will have to have people talking about your work so that it raises above the thousands of publications published every year. It is too soon to say for sure what I prefer.

Tuesday, February 28, 2006


Track your comments with coComment

I am finally giving the coComment service a try and I will be experimenting with it for a while here on the blog. coComment aims to help us track the conversations we have on other blogs by aggregating comments and checking for replies. You decide to track or not a comment before submitting it and the comments tracked appear on your homepage at coComments. Alternatively you can read them via RSS feed or with a coComment box on your own blog. You can customize a lot the box so you could do a much better job than I did in trying to integrate it into your blog :).

You can find a lot more about this in these two posts on TechCrunch

Wednesday, February 15, 2006

Postgenomic

I have been wishing someone would come up with a science.memeorandum for some time and now there is one: Postgenomic. The site created by Stew (still in beta) aims to aggregate discussions going on in the life science blogs about papers, conferences and general science news. This adds a needed feedback to the science blogosphere and therefore will, in my opinion, increase the quality of discussion.
This site can for example become an excellent repository for comments on papers. Instead of adding a comment on a paper in the journal website now you can just blog about it and the content gets aggregated on postgenomic. I am not sure but I think we could make a greasemonkey script to check the current web page for a DOI and see if there are reviews about it in postgenomic and add a little link somewhere.

Some more links about it:
Nodalpoint
Notes and more Notes

Tuesday, February 14, 2006

The search wars turn ugly

What will convince you to change your search engine ? So far it as been all about who gives the best results and who indexes the biggest number of pages. It looks like number two (Yahoo!) and number three (MSN) search engines are considering paying you to switch. How does MSNSearchAndWin sound like ? I also taught it was some kind of joke but you can try it yourself.
To be fair, someone mentioned also that Google is thinking of paying Dell to have Google software pre-installed on the new computers.

I would prefer it was about new innovation and not just about how as more cash to give to the users. It even sounds a bit ridiculou. It is not only free they are thinking of paying us to use it. Very competitive market.

Monday, February 13, 2006

BIND in the news

There is another editorial in the last issue of Nature Biotech about database funding. It focuses on BIND, explaining the growth and later decline (due to lack of funding) of this well known interaction database. Last December, BIND and other Blueprint Initiative intellectual property was bought by Unleashed Informatics but as far as I can understand, this deal merely keeps the database available on the site and there will be no further updating for now. Knowing that both BIND and unleashed were created within the Blueprint Initiative led by principal investigator Christopher Hogue (also Chief Scientific Officer of Unleashed Informatics) then this deal was probably just symbolic and a way to increase the value of the company.

According to the Nature Biotech article BIND used up "$17.3 million in federal and Ontario government funding and another $7.8 million from the private sector" to create it's value. Without the details it looks strange that so much value, mostly built with public money, ends up in a private company. Unleashed had to agreed to maintain the access to the existing value free for all and I guess it will use BIND to attract possible buyers to their tools.

Christopher Hogue posted a pessimistic comment here sometime ago about the future of databases in general. This editorial in Nature Biotech argues that it would take two important steps to allow for more permanent databases. The first step would be for the major funding agencies to accept and discuss the need for longer lived databases. The second step would be to create mechanism to decide what databases should be recognized as matured standards.

I thought that with examples like pubmed, the sequence databases and the PDB that the need for long lived databases was obvious by know to the funding bodies. The second step is a bit more tricky. Creating a minimal and stable standard for a type of data is a complicated process and it is not obvious when a database supports such a community of researchers that it would make sense to give it maintenance funding.


Some toughts from Neil, Spitshine

A similar discussion in Nodalpoint

Monday, February 06, 2006

Become a fonero and change the world

Today I read about FON, a global community of people that share wi-fi access. They just made the news because they announced support from several well known companies (Google , Skype, Sequoia Capital, and Index Ventures) that will surely catapult FON into the sky. The basic idea is to turn any wifi router into a hotspot and have people share their internet connection by installing some software on their routers or buying pre-configured wireless routers from the company. You can only use other people's FON hotspots if you are paying for one ISP at home so this is also good for the internet service providers. You can try to make money with your FON hotspot (they call these users Bills) or you can be more utopian and give away your internet connection for free (and be called a Linus). If you do not have a FON account you are called an alien but you can still connect to a FON hotspot and you will have to pay just like at any hotspot (and the ISPs get some money from this as well).
At first glance it looks like an all win cenario but only time will tell. It is certainly one case where the more that join the better the service will become and if this gets of the ground then once you pay for a connection at home you have it almost everywhere.

This is one of those simple utopian ideas with enough practical sense to make an impact so I think I will give it a try :).

Monday, January 30, 2006

I usually don't do this but ..

This is a really good blonde joke. Got love infectious silly memes.

Sunday, January 29, 2006

BioPerl has a new site

If you use BioPerl go have a look at the re-designed site. From the full announcement at OBF:

"I am pleased to announce the release of a new website for BioPerl. The site is based on the mediawiki software that was developed for the wikipedia project. We intend the site to be a place for community input on documentation and design for the BioPerl project. There is also a fair amount of documentation started surrounding bioinformatics tools and techniques applicable to using BioPerl and some of the authors who created these resources."

Friday, January 27, 2006

Meta bloguing

I changed a couple of things on the blog template. If anybody reads this with an aggregator and all previous posts appear as updated please let me know.
I added a new section on the right bar were I plan to keep some previous post that might be interesting to discuss. I had this change in mind after reading this post in Notes from the Biomass about blogging. It is true that blogging platforms don't make it easy to revisit ideas. I'll try to find other ways to do this.

I also updated the blogroll with some links. Neil's blog and Yakafokon on bioinformatics, some tech blogs I particularly like and a the blog of a portuguese friend of mine.
Our Collective Mind II

Some time ago I posted an unusual short text about collective intelligence. I think it was motivated by the web2.0 explosion, all the blogging, social websites and the layer of other services tracking these human activities in real time. The developments in the last 2-3 years were not so much a question of technical innovation since most of the tools were already developed but it was mostly a massification effect. A lot more people started to participate online instead of just browsing. This participation is very easy to track and we have automatic services that can, for example, tell us what people are currently talking about. One can think of these services as a form of self awareness. If you go to tech.memeorandum you can see a computer algorithm tracking the currently most talked about subjects in technology and organizing them into conversations. This does not mean that the web can understand what is being talked about but it is self aware.

I read today a (very long) post by Nova Spinack about this subject of self awareness and how he proposes that we should build this on a large scale. Although I agree that this type of services are very useful I am not sure that one should try to purposely build some form of collective intelligence on such abstract terms. This idea of having everything collected under the same service feels to restrictive and not very functional. I would prefer a diversity and selection approach, just let the web decide. There is a big marked for web services right now and I don't see it fading any time soon. Therefore if collective intelligence is possible and useful then rapidly services will be built on top of each other to produce it.

If you have any interest on the topic and endorse his opinion write a post and trackback to him.