Cancer datasets as a resource to study cell biology
The amazing resources that have been developed in the context of cancer biology can serve as tools to study "normal" cell biology. The genetic perturbations that happen in cancer can be viewed almost as natural experiments that we can use to ask varied questions. Different cancer consortia have produced, for the same patient samples or the same cancer cell lines, data that ranges from genomic information, such as exome sequencing, to molecular, cellular and disease traits including gene expression, protein abundance, patient survival and drug responses. These datasets are not just useful to study cancer biology but more globally to study cell biology processes. If we were interested in asking what is the impact of knocking out a gene we could look into these data to have, at least, an approximate guess of what could happen if this gene is perturbed. We can do this because it is likely that almost any given gene will have changes in copy number or deleterious mutations given a sufficiently large sample of tumours or cell lines. Of course, there will be a whole range of technical issues to deal with since it would not be a "clean" experiment comparing the KO with a control.Studying complex assembly using protein abundance data
More recently the CPTAC consortium and other groups have released proteomics measurements for some of the reference cancer samples. Given the work that we have been doing in studying post-translational control we started a few projects making use of these data. One idea that we tried and have recently made available online via a pre-print was to study gene dosage compensation. When there are copy number changes, how often are these propagated to changes in gene expression and then to protein level ? This was work done by Emanuel Gonçalves (@emanuelvgo), jointly with Julio Saez-Rodriguez lab. There were several interesting findings from this project, one of these was that we could identify members of protein complexes that indirectly control the degradation of other complex subunits. This was done by measuring, in each sample, how much of the protein abundance changes are not explained by its gene expression changes. This residual abundance change is most likely explained either by changes in the translation or degradation rate of the protein (or noise). We think that, for protein complex subunits, this residual mainly reflects degradation rates. Emanuel then searched for complex members that had copy number changes that predicted the "degradation" rate of other subunits of the same complex. We think this is a very robust way to identify such subunits that act as rate-limiting factors for complex assembly.Predicting E3 or protease targets
If what I described above works to find some subunits that control the "degradation" of other subunits of a complex then why not use the exact same approach to find the targets of E3 ligases or proteases ? Emanuel gave this idea a try but in some (fairly quick) tests we could not see a strong predictive signal. We collected putative E3 targets from a few studies in the literature (Kim et al. Mol Cell Biol. 2015; Burande et al, Mol Cell Proteomics. 2009; Lee et al. J Biol Chem. 2011; Coyaud et al. Mol Cell Proteomics. 2015; Emanuele MJ et al. Cell 2011). We also we collected protease targets from the Merops database. We then tried to find a significant association between the copy number or gene expression changes of a given E3 with the proxy for degradation, as described above, of any other protein. Using the significance of the association as the predictor with would expect a stronger association between an E3 and their putative substrates than with other random genes. Using a ROC curve as descriptor of the predictive power, we didn't really see robust signals. The figure above shows the results when using gene expression changes in the E3 to associate with the residuals (i.e. abundance change not explained by gene expression change) of the putative targets. The best result, was obtained for CUL4A (AUC=0.59) in this case but overall the predictions are close to random.A similar poor result was generally observed for protease targets from the merops database although we didn't really make a strong effort to properly map the merops interactions to all human proteins. Emanuel tried a couple of variations. For the E3s he tried restricting the potential target list to proteins that are known to be ubiquitylated in human cells but that did not improve the results. Also, surprisingly, the genes listed as putative targets of these E3s are not very enriched in genes that increase in ubiquitylation after proteasome inhibition (from Kim et al. Mol Cell. 2011) with the clearest signal observed in the E3 targets proposed by Emanuele MJ and colleagues (Emanuele MJ et al. Cell 2011).