Tuesday, April 11, 2006

Stable scientific databases

The explosion of scientific data coming from high throughput experimental methods has lead to the creation of several new databases for biological information (protein structures, genomes, metabolic networks and kinetic rates, expression data, protein interactions, etc). Given that funding is generally attributed for a limited time and for defined projects it is possible to obtain money to start a database project but it very difficult to obtain a stable source of funding to sustain a useful database. I mentioned this before more than once when talking about the funding problems of BIND.
In this issue of The Scientist there is a short white paper entitle "Save our Data!". It details the recommendations of The Plant Genome Database Working Group for the problems currently faced by the life science databases.

I emphasize here four point they make:
2. Develop a funding mechanism that would support biological databases for longer cycle times than under current mechanisms.
3. Foster curation as a career path.
6. Separate the technical infrastructure from the human infrastructure. Many automated computational tasks do not require specialized species- or clade-specific knowledge.
7. Standardize data formats and user interfaces.


The first and last points were also discussed a recent editorial in Nature Biotech.

What was a bit of a surprise for me is their 3rd point on fostering curation as career path. Is it really necessary to have professional curators ? I am a bit divided between a more conservative approach at data curation with a team of professional curators or a wisdom of the crowds type of approach were tools are given to the communities and they solve the curation problems. I think it would be more efficient to find ways to have the people producing the data, curating it automatically into the databases. To have this happen it has to be really easy and immediate to do. I still think that journals are the only ones capable of enforcing this process.

The 6th point they make is surely important even if the curation effort are to be pushed back to the people producing the data. It is important to make the process of curating the data as automatic and easy as possible.

Friday, April 07, 2006

Retracted scientific work still gets cited

Science has a news focus on scientific misconduct. A particular study tracked the citation of papers that were already retracted. They found that scientists keep citing retracted papers.
Some editors contacted by Science said that they do not have the resources to look up every citation in every paper to help purge the literature of citations to retracted work. In my opinion this is not such a complicated problem. If journals agreed to submit to a central repository all retractions, then the citations could very easily be checked against the database and removed. Even with such an automatic system , scientists should have the responsibility to keep up with the works being retracted in their fields.
Since retractions are publicly announced by the journals pubmed has already some of this information available. If you search for retraction in the title in pubmed you can see several of these announcements (not all are retractions). In some cases, when you search for a the title of a retracted paper you can see in pubmed a link to the retraction but this is not always the case. All that is needed is for publishing houses to agree on a single format to publish retractions and repositories to make sure all retractions are appended to the former entries to the same publication.

Tuesday, April 04, 2006

Viral marketing gone wrong

The social internet has emerged as an ideal ground for marketing. People enjoy spreading news and in the internet meme spreading sometimes resembles a viral infection propagating throughout the network.
Some companies like Google have made their success on this type of word-of-mouth marketing. If you can get a good fraction of the social internet to be attached to your products in such a way that they want to tell their friends all about it , you don't have to spend money in marketing campaigns.
The important point here is that a fraction of people must be engaged in the meme, they must find it so cool and interesting that they just have to go and tell their friends and infect them with the enthusiasm. How do you do this ? That's the hard part I guess.
So, the marketing geniuses of Chevrolet decided that they would try their hands at viral marketing. To get people engaged they decided to have the masses build the ads. We usually like what we build and we want to show it to our friends, so the idea actually does not sound so bad right ?! :) well , this would have been a fantastic marketing idea, if most people actually had good things to say about the product.

Here is an example of the videos coming out from the campaign:


I worried before that this type of marketing could be a negative consequence of science communication online but these examples just show that directing attention alone is not enough, people will judge what they find and are free to criticize.

Monday, April 03, 2006

The Human interactome project

Marc Vidal has a letter in The Scientist urging scientist and funding agencies to increase efforts to map all human protein interactions. He suggests that different labs work on different parts of the huge search space (around 22000^2 excluding splice variants) and of course that funding agencies give out more money to support the effort. He makes an interesting point when he compares funding for genome projects with interactome mapping. I also think that the interactome mapping should be view in the same way has genome sequencing and that the money invested would certainly result in significant progress in basic and medical research.
The only thing I would add to my own wish list is that some groups would start comparative projects at the same time. Even if it takes longer to complete the human interactome it would be much more informative to have of map of the ortholog proteins in a sufficiently close species to compare with (like mouse). Alternatively some funding could go specifically to comparative projects studying for example the interactomes of different yeasts (it is easy to guess that I would really really like to have this data for analysis :).


Friday, March 31, 2006

Get ready for the 1st of April storm

Tomorrow is April fools day and there is a long tradition in the media to put out jokes on this day. Some years ago this was, for me, almost not noticeable. I knew that the newscasts in the different TV channels would have at least one spoof story. Maybe I would notice the joke in one or two newspapers if I actually read one that day. These days I get almost everything from the internet and it is no longer just from a handful of sources, it comes from tons of media sites, blogs and aggregators. So every year that I am more connect I notice more the 1st of April as the day where everybody goes nuts on the web. This year it even starts early has you can see by this gold fish story in the economist. Maybe spishine's post on quitting blogging was also an example of early April fools ;).

Tuesday, March 21, 2006

Wiki-Science

From Postgenomic (now on Seed Media Group servers), I picked up this post with some speculations on the future of science. It is a bit long but interesting. It was written by the former editor of Wired magazine so it is naturally biased to speculations on technology changes.

My favorite prediction is what he called Wiki-Science:

"Wiki-Science - The average number of authors per paper continues to rise. With massive collaborations, the numbers will boom. Experiments involving thousands of investigators collaborating on a "paper" will commonplace. The paper is ongoing, and never finished. It becomes a trail of edits and experiments posted in real time - an ever evolving "document." Contributions are not assigned. Tools for tracking credit and contributions will be vital. Responsibilities for errors will be hard to pin down. Wiki-science will often be the first word on a new area. Some researchers will specialize in refining ideas first proposed by wiki-science."

I am trying to write a paper right now and just last week the thought crossed my mind of just doing it online in Nodalpoint's wiki pages and inviting some people to help/evaluate/review. However I am not sure that my boss would agree with the idea and honestly I am a bit afraid of losing the change of publishing this work as a first author. Maybe when I get this off my hands I'll try to start an open project on a particular example of network evolution.

Links on topic:
Nodalpoint - projects ; collaborative research post
Science 2.0
Looking back two years ago - M$ vs GOOG

I was reading a story today about the keynote lecture by Bill Gates on the Mix'06 conference and I remembered posting something on the blog when I first saw a story about Microsoft moving into the search market. This is one of the funny things about having the blog is that I can go back to what I was reading and thinking back some time in the past. So from the previous post I guess Microsoft started reacting to the rise of Google more than two years ago. Retrospectively it was really hard to predict the impact of web2.0 and free software/add model. Judging by Gates' speech , only now is Microsoft really completed turned into this direction so I guess it takes some time to turn such a big boat. They managed before (see browser wars) to turn the company into the internet era and maintain dominance, let's see how they keep up this time with Google, Yahoo, Amazon, etc.

Looking back on some of the post of that time I realize how I changed my blogging habits. In the beginning I used the blog more like a link repository with short comments while currently I tend to blog more about my opinion on a topic. I'll check again in some years from now if I don't quit in the meantime :).

Wednesday, March 08, 2006

Comparative Interactomics on the rise

I am sorry for the buzzwords but I just wanted to make the point of the exaggerated trend. Following the post on Notes from the Biomass I picked up the paper from Gandhi et al in Nature Genetics. The authors analyzed the human interactome from the Human Protein Reference Database, comparing it to other protein interaction networks from different species. Honestly I was a bit surprised to see so few new ideas on the paper and I agree with the post in Notes that they should have cited some previous works. For example the paper by Cesareni et al in FEBS Letters includes a similar analysis between S. cerevisiae and D. melanogaster. Also the people working on PathBlast have shown that maybe it is more informative to look for conserved sub-networks instead of the overlap between binary-interactions. I am personally very interested in network evolution and I was hoping the authors would elaborate a bit more on the subject. As usual they just dismiss the small overlap to low coverage. Is it so obvious that species that diverged 900My to 1By ago should have such similar networks ?

Like it was the case with comparative genomics, the ability to compare cellular interaction networks of different species should be far more informative than looking at individual maps. Unfortunately it is still not so easy to map a cellular interaction network has it is to get a genome.

Just out of curiosity, I think the first time the buzz words "comparative interactomics" were used in a journal was in a news and views by Uetz and Pankratz in 2004. Since then I think two papers picked up on the term, as you can see in this pubmed search (might change with time).

Monday, March 06, 2006

Marketing and science

I just spent 48 minutes seeing this video where Seth Godin spoke to Google about marketing. He talks a lot about how it is important to put out products that have a story, that compels people to go and tell their friends. This type of networking marketing is usually referred to as viral marketing (related to memetics). It is a really nice talk (he knows how to sell his ideas :) and it got me thinking of marketing in science.

The number of journals and publications keep growing at a fast pace. Out of curiosity I took from pubmed the number of publications published every year for the past decade and we can clearly see that the trend for the near future is, if anything, for further acceleration in the rate of publication.

The other important point is that internet is probably changing the impact that an individual paper might have, irrespective of where it is published. It is easier with internet, for word-of-mouth (meaning emails, blogs, forums,etc) to raise awareness to good or controversial work than before.
So what I am getting at is that, on one hand the internet will likely help individual publications to get their deserved attention but on the other hand it will increase the importance of marketing in science. Before, to have attention, your work needed to be published in journals that were available in the libraries, now and I suspect, increasingly so in the future, you will have to have people talking about your work so that it raises above the thousands of publications published every year. It is too soon to say for sure what I prefer.

Tuesday, February 28, 2006


Track your comments with coComment

I am finally giving the coComment service a try and I will be experimenting with it for a while here on the blog. coComment aims to help us track the conversations we have on other blogs by aggregating comments and checking for replies. You decide to track or not a comment before submitting it and the comments tracked appear on your homepage at coComments. Alternatively you can read them via RSS feed or with a coComment box on your own blog. You can customize a lot the box so you could do a much better job than I did in trying to integrate it into your blog :).

You can find a lot more about this in these two posts on TechCrunch

Wednesday, February 15, 2006

Postgenomic

I have been wishing someone would come up with a science.memeorandum for some time and now there is one: Postgenomic. The site created by Stew (still in beta) aims to aggregate discussions going on in the life science blogs about papers, conferences and general science news. This adds a needed feedback to the science blogosphere and therefore will, in my opinion, increase the quality of discussion.
This site can for example become an excellent repository for comments on papers. Instead of adding a comment on a paper in the journal website now you can just blog about it and the content gets aggregated on postgenomic. I am not sure but I think we could make a greasemonkey script to check the current web page for a DOI and see if there are reviews about it in postgenomic and add a little link somewhere.

Some more links about it:
Nodalpoint
Notes and more Notes

Tuesday, February 14, 2006

The search wars turn ugly

What will convince you to change your search engine ? So far it as been all about who gives the best results and who indexes the biggest number of pages. It looks like number two (Yahoo!) and number three (MSN) search engines are considering paying you to switch. How does MSNSearchAndWin sound like ? I also taught it was some kind of joke but you can try it yourself.
To be fair, someone mentioned also that Google is thinking of paying Dell to have Google software pre-installed on the new computers.

I would prefer it was about new innovation and not just about how as more cash to give to the users. It even sounds a bit ridiculou. It is not only free they are thinking of paying us to use it. Very competitive market.

Monday, February 13, 2006

BIND in the news

There is another editorial in the last issue of Nature Biotech about database funding. It focuses on BIND, explaining the growth and later decline (due to lack of funding) of this well known interaction database. Last December, BIND and other Blueprint Initiative intellectual property was bought by Unleashed Informatics but as far as I can understand, this deal merely keeps the database available on the site and there will be no further updating for now. Knowing that both BIND and unleashed were created within the Blueprint Initiative led by principal investigator Christopher Hogue (also Chief Scientific Officer of Unleashed Informatics) then this deal was probably just symbolic and a way to increase the value of the company.

According to the Nature Biotech article BIND used up "$17.3 million in federal and Ontario government funding and another $7.8 million from the private sector" to create it's value. Without the details it looks strange that so much value, mostly built with public money, ends up in a private company. Unleashed had to agreed to maintain the access to the existing value free for all and I guess it will use BIND to attract possible buyers to their tools.

Christopher Hogue posted a pessimistic comment here sometime ago about the future of databases in general. This editorial in Nature Biotech argues that it would take two important steps to allow for more permanent databases. The first step would be for the major funding agencies to accept and discuss the need for longer lived databases. The second step would be to create mechanism to decide what databases should be recognized as matured standards.

I thought that with examples like pubmed, the sequence databases and the PDB that the need for long lived databases was obvious by know to the funding bodies. The second step is a bit more tricky. Creating a minimal and stable standard for a type of data is a complicated process and it is not obvious when a database supports such a community of researchers that it would make sense to give it maintenance funding.


Some toughts from Neil, Spitshine

A similar discussion in Nodalpoint

Monday, February 06, 2006

Become a fonero and change the world

Today I read about FON, a global community of people that share wi-fi access. They just made the news because they announced support from several well known companies (Google , Skype, Sequoia Capital, and Index Ventures) that will surely catapult FON into the sky. The basic idea is to turn any wifi router into a hotspot and have people share their internet connection by installing some software on their routers or buying pre-configured wireless routers from the company. You can only use other people's FON hotspots if you are paying for one ISP at home so this is also good for the internet service providers. You can try to make money with your FON hotspot (they call these users Bills) or you can be more utopian and give away your internet connection for free (and be called a Linus). If you do not have a FON account you are called an alien but you can still connect to a FON hotspot and you will have to pay just like at any hotspot (and the ISPs get some money from this as well).
At first glance it looks like an all win cenario but only time will tell. It is certainly one case where the more that join the better the service will become and if this gets of the ground then once you pay for a connection at home you have it almost everywhere.

This is one of those simple utopian ideas with enough practical sense to make an impact so I think I will give it a try :).

Monday, January 30, 2006

I usually don't do this but ..

This is a really good blonde joke. Got love infectious silly memes.

Sunday, January 29, 2006

BioPerl has a new site

If you use BioPerl go have a look at the re-designed site. From the full announcement at OBF:

"I am pleased to announce the release of a new website for BioPerl. The site is based on the mediawiki software that was developed for the wikipedia project. We intend the site to be a place for community input on documentation and design for the BioPerl project. There is also a fair amount of documentation started surrounding bioinformatics tools and techniques applicable to using BioPerl and some of the authors who created these resources."

Friday, January 27, 2006

Meta bloguing

I changed a couple of things on the blog template. If anybody reads this with an aggregator and all previous posts appear as updated please let me know.
I added a new section on the right bar were I plan to keep some previous post that might be interesting to discuss. I had this change in mind after reading this post in Notes from the Biomass about blogging. It is true that blogging platforms don't make it easy to revisit ideas. I'll try to find other ways to do this.

I also updated the blogroll with some links. Neil's blog and Yakafokon on bioinformatics, some tech blogs I particularly like and a the blog of a portuguese friend of mine.
Our Collective Mind II

Some time ago I posted an unusual short text about collective intelligence. I think it was motivated by the web2.0 explosion, all the blogging, social websites and the layer of other services tracking these human activities in real time. The developments in the last 2-3 years were not so much a question of technical innovation since most of the tools were already developed but it was mostly a massification effect. A lot more people started to participate online instead of just browsing. This participation is very easy to track and we have automatic services that can, for example, tell us what people are currently talking about. One can think of these services as a form of self awareness. If you go to tech.memeorandum you can see a computer algorithm tracking the currently most talked about subjects in technology and organizing them into conversations. This does not mean that the web can understand what is being talked about but it is self aware.

I read today a (very long) post by Nova Spinack about this subject of self awareness and how he proposes that we should build this on a large scale. Although I agree that this type of services are very useful I am not sure that one should try to purposely build some form of collective intelligence on such abstract terms. This idea of having everything collected under the same service feels to restrictive and not very functional. I would prefer a diversity and selection approach, just let the web decide. There is a big marked for web services right now and I don't see it fading any time soon. Therefore if collective intelligence is possible and useful then rapidly services will be built on top of each other to produce it.

If you have any interest on the topic and endorse his opinion write a post and trackback to him.

Wednesday, January 18, 2006

Power law distributions

Almost every time a lot of hype is built around an idea there is general backlash against the very same idea. In technology this happens regularly and it is maybe due to a snowball effect that leads to abuse. Initially a new concept is proposed that leads to useful new products and this in turn increases interest and funding (venture capital, etc). In response, several people copy the concept or merely tag their work with the same buzz to attract interest. Soon enough everyone is doing the same thing and the new concept reaches world fame. At this point it is hard to find actual good products based on the initial idea among all the noise. For a recent tech example just think of the web2.0 meme. Every startup now a days releases their projects in beta with some sort of tagging/social/mash-up thing. The backlash in already happening for web2.0.

What about the title ?
I had already mentioned a review article about power-law distributions. The author voiced some concern over the exaggerated conclusions researchers are making about the observation of these distributions in complex networks. Is the backlash coming for this hype?

Recently Oliveira and Barabasi published yet another paper on the ubiquity of power laws. This time it was about the correspondence patterns Darwin and Einstein where they claim that the time delays for the replies follow a power-law. This work is similar to earlier work by Barabasi about email correspondence. Quickly after, a comment was published in Nature suggesting that the data is a better fit for the lognormal distribution and this generated some discussion on the web. There is also some claims of similar previous work using the same data not properly cited.

The best summary of the whole issue comes in my opinion from Michael Mitzenmacher:
"While the rebuttal suggests the data is a better fit for the lognormal distribution, I am not a big believer in the fit-the-data approach to distinguish these distributions. The Barabasi paper actually suggested a model, which is nice, (...) anyone can come up with a power law model. The challenge is figuring out how to show your model is actually right."

Other papers have recently put questions also on the quality of the data underlying some of these studies. Is life all log-normal after all :) ?

What I actually want to discuss is the hype. Going back to the beginning of the post, how can we keep science from generating such hype around particular memes. People like Barabasi are capable of captivating the imagination of a broad audience and help bring society closer to science but usually at some cost. I think this is tied to science funding. What gets funded is what is perceived as the cutting edge, the trendy subjects. Trendy things get a lot of funding and more visibility until the whole thing crashes down with the weight of all the noise in the field.

In a brilliant paper (the one about a radio :) Lazebnik remembers some advice from David Papermaster:
"David said that every field he witnessed during his decades in biological research developed quite similarly. At the first stage, a small number of scientists would somewhat leisurely discuss a problem that would appear esoteric to others (...) Then, an unexpected observation (...) makes many realize that the previously mysterious process can be dissected with available tools and, importantly, that this effort may result in a miracle drug. At once, the field is converted into a Klondike gold rush with all the characteristic dynamics, mentality, and morals. (...) The assumed proximity of this imaginary nugget easily attracts both financial and human resources, which results in a rapid expansion of the field. The understanding of the biological process increases accordingly and results in crystal clear models that often explain everything and point at targets for future miracle drugs.(...) At some point, David said, the field reaches a stage at which models, that seemed so complete, fall apart, predictions that were considered so obvious are found to be wrong, and attempts to develop wonder drugs largely fail. (...) In other words, the field hits the wall, even though the intensity of research remains unabated for a while, resulting in thousands of publications, many of which are contradictory or largely descriptive."

Is this necessary ? Is there something about the way science is made that leads to this ? Can we change it?

Thursday, January 12, 2006

European Research Council (ERC)

For those of you who don't usually read about European research policies, the European Research Council is a projected European structure being designed to support basic research. It is now clear that the ERC will be formed but it is still unknown how much money the EU budget will reserve for it. Recently the Scientific Council of the future ERC was nominated and the chairman is none other than Fotis Kafatos, the former EMBL director. Kafatos term as EMBL director ended in May last year and his nomination as chairman of the ERC will, in my opinion, strengthen the research council and hopefully help it attract the funding required.

For further reading:
Kafatos named Chairman of ERC Council (EMBL announcement)
Chairman explains Europe's research council (interview for Nature)
Election of Chairman of Scientific Council (press release hidden among several other)