Monday, January 11, 2010

In science, data without purpose is sometimes required

The title is probably flamebait but it might get you to read my little rant about data production in science. Its something I have been meaning to write about for a while but Deepak's post provided the extra incentive.

I think Deepak's post was a reminder that science is nothing without hypothesis and I certainly agree with that. To put this into context maybe it worth pointing again to the wired article about "The End of Theory" where Chris Anderson painfully tries to make the point that with the deluge of data that we are seeing we don't need models or hypothesis we just need to crunch the data to look for correlations.

I strongly disagree with this viewpoint. What would we learn about reality this way ? At most we would see correlations and could have some predictive power about future events but we would not know the mechanisms and thats the interesting part.

So why is data without purpose sometimes justified ? What I mean by this is that the capacity to produce data and its analysis does not have to be centralized in the same place. My perspective (bioinformatician) is from someone that has benefited a lot from the data deluge in biology and the fact that data is made (mostly) available to others. It has allowed many studies that reuse pre-existing results to answer new questions.

I also work in lab the develops genetic interactions screening methods and end having some discussions about this topic. Many people dislike this sort of research, finding creative names like "fishing expedition" to describe it. The truth is that there are many types of data that we need to collect (genomes, gene expression, protein-protein interactions, etc) that we know that will be useful to understand how cells work. We just need more accurate and cheaper methods to get them and there is no other way but to have the focus of the research be the data production itself.