Tuesday, April 02, 2013

Benchmark the experimental data not just the integration

There was a paper out today in Molecular Systems Biology with a resource of kinase-substrate interactions obtained from in-vitro kinase assays using protein micro-arrays. It is clear that there is a significant difference between what a kinase regulates inside a cell and what it could phosphorylate in-vitro given appropriate conditions. In fact, reviewer number 1 in the attached comments (PDF), explains at length why these protein-array based kinase interactions may be problematic. The authors are aware of this and integrate the protein-array data with additional data sources to derive a higher confidence dataset of kinase interactions. The authors then provide computational and experimental benchmarks of the integrated dataset. What I have an issue with is that the original protein-array data itself it not clearly benchmarked in the paper. How are we to know what is the contribution of that feature and all of the hard experimental work for the final integrated predictor ?

A very similar procedure was used in a recent Cell paper paper where co-complex membership was predicted based on the elution profiles of proteins detected by mass-spectrometry. Here again, the authors do not present benchmarks of the interactions predicted solely on the co-elution data. Instead they integrate it with around 15 other features before evaluating and studying the final result. In this case, they have in supplementary material some indirect indication of the value of the experimental data by itself by providing the rank each feature has in the predictor.

I don't think the papers are incorrect. In both cases the authors provide an interesting final result with the integrated set of interactions benchmarked and analysed. However, in both cases, we are unsure of the value of the experimental data that is presented. I don't think it is an unreasonable request. There are many reasons why this information should be clearly presented before additional data integration steps are used. At the very least this is important for other groups thinking about setting up similar experimental approaches.