Cellular Consequences of Genetic variation: personal genome

Showing posts with label personal genome. Show all posts

Wednesday, March 28, 2012

Individual genomics of yeast

Nature Genetics used to be one of my favorite science journals. It consistently had papers that I found exciting. That changed about 5 years ago or so when they had a very clear editorial shift into genome-wide association studies (GWAS). Don't take me wrong, I think GWAS are important and useful but I don't find it very exciting to have lists of regions of DNA that might be associated with a phenotype. I want to understand how variation at the level of DNA gets propagated through structures and interaction networks to cause these differences in phenotype. I mostly stayed out of GWAS since I was focusing on the evolution of post-translational networks using proteomics data but I always felt that this line research was not making full use of what we know already about how a cell works.

In this context, I want to tell you about a paper that came out from Ben Lehner's lab that finally made me excited about individual variation and why I think it is such a great study. I was playing around with the similar idea when the paper came out so I will start with the (very) preliminary work I did and continue with their paper. I hope it can serve as small validation of their approach.

As I just mentioned, I think we can make use of what we know about cell biology to interpret the consequence of genetic variation. Instead of using association studies to map DNA regions that might be linked to a phenotype, we can take a full genome and try to guess what could be deleterious changes and their consequences. It is clear that full genome sequences for individuals are going to be the norm so how do we start to interpret the genetic variations that we see ? For human genetic variation, this is a highly complex and challenging task.

Understanding the consequences of human genetic variation from the DNA to phenotype requires knowledge of how variation will impact on proteins's stability, expression and kinetics; how this in turn changes interaction networks; how this variation is reflected in each tissue function; and ultimately to a fitness difference, disease phenotype or response to drugs. Ultimately we would like to be able to do this but we can start with something simpler. We can take unicellular species (like yeast) and start by understanding cellular phenotypes before we move to more complex species.

To start we need full genome sequences for many different individuals of the same species. For S. cerevisiae we have genome sequences for 38 different isolates by Liti et al. We then need phenotypic differences across these different individuals. For S. cerevisiae there was a great study published June last year by Warringer and colleagues were they tested the growth rate of these isolates under ~200 conditions. Having these data together we can attempt to predict how the observed mutations might result in the differences in growth. As a first attempt we can look at the non-synonymous coding mutations. For these 38 isolates there are something like 350 thousand non-synonymous coding mutations. We can predict the impact of these mutations on a protein either by analyzing sequence alignments or using structures and statistical potentials. There are advantages and disadvantages to both of the approaches but I think they end up being complementary. The sequence analysis required large alignments while the structural methods require a decent structural model of the protein. I think we will need a mix of both to achieve a good coverage of the proteome.

I started with the sequence approach as it was faster. I aligned 2329 S. cerevisiae proteins with more than 15 orthologs in other fungal species and used MAPP from the Sidow lab at Stanford to calculate how constrained each position is. I got about 50K non-synonymous mutations scored with MAPP of which about 1 to 8 thousand could be called potentially deleterious depending on the cut-off. To these we can add mutations that introduce STOP codons, in particular if they occur early in the protein (~710 of these within the first 50 AAs of proteins).

So up to here we have a way to predict if a mutation is likely to impact negatively on a protein's function and/or stability. How do we go from here to a phenotype like a decrease growth rate under the presence of stress X ? This is exactly the question that chemical-genetic studies try to address. Many labs, including our own, have used knock-out collections (of lab strains) to measure chemical-genetic interactions that give you a quantitative relative importance of each protein in a given condition. So, we can make the *huge* simplification that we can take all deleterious mutations and just sum up the effects assuming a linear combination of the effects of the knock-outs.

To test this idea I picked 4 conditions (out of the 200 from mentioned above) for which we have chemical-genetic information (from Parsons et al. ) and where there is a high growth rate variation across the 38 strains. With everything together I can test how well we can predict the the measured growth rates under these conditions (relative to a lab strain):

Each entry in the plot represents 1 strain in a given condition. Higher values report worse predicted/experimental growth (relative to a lab strain). There is a highly significant correlation between measured and predicted growth defects (~0.57) overall but cisplain growth differences are not well predicted by these data. Given the many simplifications and poor coverage of some of the methods used I was even surprised to see the correlation at all. This tells us, that at least for some conditions, we can use mutations found in coding regions and appropriately selected gene sets to predict growth differences.

This is exactly the message of the Rob Jelier's paper from Ben Lehner's lab. When they started their work, the phenotypic dataset from Warringer and colleagues was not yet published so they had to generate their own measurements for this study. In addition their study is much more careful in several different ways. For example they only used the sequences for 19 strains that they say have higher coverage and accuracy. They also tried to estimate the impact of indels and they try to increase the size of the alignments (a crucial step in this process) by searching for distant homologs. If you are interested in making use of "personal" genomes you should really read this paper.

Stepping back a bit I think I was excited about this paper because it finally connects the work that has been done in high-throughput characterization of a model organism with the diversity across individuals of that species. It serves as bridge for many people to come to work in this area. There are a large number of immediate questions like how much do we really need to know to make good/better predictions ? What kind of interactions (transcriptional, genetic, conditional genetic) do we need to know to capture most of the variation ? Can we select gene-set and gene weights in other species without the conditional-genetics information (by homogy) ?

As we are constantly told, the deluge of genome sequences will continue so there are plenty of opportunities and data to analyze (I wish I had more time ;). Some recent examples of interest include the sequencing of 162 D. melanogaster lines with associated phenotypic data and the (somewhat narcissistic) personal 'omics study of Michael Snyder. To start to make the jump to human I think it would be great to have cellular phenotypic data (growth rate/survival under different conditions) for the same cells/tissue across a number of human individuals with a sequenced genome. Maybe in a couple of years I wont be as skeptical as I am now about our fortune cookie genomes.

Monday, November 19, 2007

Linking out - Personalized medicine

Personalized medicine continues to climb the hype cycle. I have been getting most of the best news coverage on the subject from blogs.

- Bertalan Meskó reviews companies focused on personalized medicine (see part I and II)

- Attila Csordas and Deepak Singh cover the social aspects of personal health and the tie-in to 23andMe

- Gareth Palidwor reads into the details to speculate that the business model of 23andMe might be to sell the aggregated user data.

- Gene Sherpas puts on the brakes, describing the hype as Genomic Voyeurism

I am concerned that all the attention the genomics side of personalized medicine will distort the relative importance of nature versus nurture. Everyone craves for a peek at their own destiny and at their roots. These services hope to provide both of these by looking at our DNA. I don't think they can really do this reliably but nothing stops them from luring people.

Friday, October 19, 2007

The Fortune Cookie Genome

*in an imaginary future*

Today is the day I get the sequencing results back. It is going be interesting to have finally a glimpse of my very own genome. At the same time I am afraid of the potential disease associations they might find in there. In any case I rather know it with time to do something about it. Thats it ... I exhale and open the main door to the building walking up the desk.

- Hi. I have an appointment with my genetic adviser.
- Oh yes, go up to the 3rd floor, they are expecting you.

I walk up a DNA shaped stairway and walk into the office of one of the attending specialists. He was the one convincing me of how useful it would be to purchase the GenomeSurvey(TM) package.

- I got your email. The results are in ?
- Yes, we have your genome fully sequenced and uploaded into your service of choice. I see you have picked Google Health as your storage provider as part of the package.
- Is there any bad news ? Will I have a serious disease soon ?
- I understand your concern. There is really nothing too serious, but I will come to that in moment. You may login with your Google account here and I can guide you through some of the results.

I login to my health page and I am confronted with the usual simple white-blue Google interface. I noticed the addition of a genome tab and let my adviser tell me more about it.

- As you can see, your genome as been uploaded to your account. It has also been submitted as an John Doe genome to the NCBI personal genomics database. You may select later to make your identity known and/or associate any of your personal history information to it.
- What about the disease associations ?
- Yes. So you can click here on the associations report to have a full listings of the phenotypic associations. You have a very healthy genome, no serious rare diseases. In your case the most important finding is that you have a 2% increased likelihood of developing a heart condition when you are above 60 and a 1% increased likelihood of having Alzeimer's disease after 65.
- That's it ? 2% ? 1 %?
- Well, that is assuming no prior knowledge on your diet and other personal history as established in the large HapMap version 10. From now on you may input into the forms provided in Google Health all your diet and other personal information on a daily basis and as the information accumulates the service will automatically update the probabilities. As your adviser I should tell you that this information can be used by Google to provide you with better targeted advertisement in all other Google products.
- Right ... is this it ? Does the package include anything else ?
- Of course ! As I mentioned to you before you can click here on the prescription tab to get an informal advice on how best to deal with the associations that were found for you. You should always discuss these suggestions with your doctor before doing anything. By company policy I cannot read this information with you, since we are not liable for this. You can read it at home when you get there.
- Well , if there is nothing else I will go.
- Thank you again for choosing our GenomeSurvey(TM) package I am happy to have served you and I hope that you feel more empowered about your own health. Be well.

I go home feeling a bit cheated but obviously happy of having no serious disorder in the horizon. I rush to my home computer to read the prescription that will help me prevent my heart condition and Alzeimers. I click the GoolgeDoctor(TM) button and a clip like avatar jumps around in the screen. A computerized voice reads aloud the text appearing in the screen:

Dear Pedro. You can call me clipy ! I will be your assistant for any of your health needs. In order to decrease the likelihood for the negative phenotypes associated to your genome please consider abiding by the following rules:
- Do a lot of exercise
- Eat a healthy diet
- Find balance in your life

*in an imaginary present*

- Snap out of it, what does your say ?
I look back to the small piece of paper in my hand and read:
- "You must find balance in your life", thats what it says.
- Well, these things are never wrong.

I drop the paper on my dish and finish eating the fortune cookie before leaving the chinese restaurant with my friends.
- You won't believe what I thought of ...

Further reading
The Future of Personal Genomics (21 September 2007 Science)
How much information is there really in personal genomes and how much should patients know ? Extra points for citing a post from Eye on Dna in a Science Policy Forum.
The Science and Business of Genetic Ancestry Testing (10th October 2007 Science)
A discussion surrounding results of genetic ancestry tests and the commercialization of these tests.
Google Says Its Health Platform Is Due In Early 2008 (17 October InformationWeek)
Google is still trying to build a platform to host the health related information. Microsoft already launched a service called HealthVault (read about it from Deepak).
BMC Medical Genomics (17 October BMC blog)
BMC will launch a journal dedicated to Medical Genomics, covering articles on "on functional genomics, genome structure, genome-scale population genetics, epigenomics, proteomics, systems analysis and pharmacogenomics in relation to human health and disease."
Do-it-yourself science (17 October Nature)
This editorial links up several news, opinions and articles in the last issue of Nature to ask the question - How much involvement can patient advocates have in genetics? The most impressive articles is the story of Hugh Rienhoff, a trained geneticist and biotechnology that decided to personally research about his daughter's disease (as in buying a PCR machine etc). (via Keith)
Common sense for our genomes (18 October Nature)
Steven E. Brenner explains the need for a Genome Commons. See discussion at bbgm.

Thursday, May 24, 2007

Nature vs. Nurture in personalized medicine

Personalized medicine aims to determine the best therapy for an individual based on personal characteristics. Given that the family history is a risk factor for many diseases there is a strong motivation for the search of inheritable genetic variation that might provide molecular explanations for diseases. In the last couple years, improvements in sequencing technology have helped to scale up these efforts. The HapMap project is an example of these attempts at genome wide characterization of human genetic variation. The project aims to create a haplotype map of the human genome. This map is important because correlating a disease with a haplotype can be used to pin-point the cause of a disease to a genome region. This map based approach is done by first sequencing known sites of polymorphisms, spaced across the genome, in a large population and then associating disease with haplotypes (see a recent example).

Eventually sequencing costs will go down to a point when these map based approaches are replaced by full genome re-sequencing. It looks like there is a consensus that this is just a matter of time. Also, the main sequencing centers seem to be directing more of their efforts to studying variation. If sequencing full genomes is currently too expensive, sequencing coding regions is much more affordable. In two recent papers (Greenman et al. and Sjoblom et al.) researchers have tried to identify somatic mutations in human cancer genomes by sequencing. Greenman and colleagues focused on 518 kinases and searched for mutations in these genes in 210 different human cancers (see post by Keith Robison). Sjoblom and colleagues on the other hand sequenced fewer cancer types (11 breast and 11 colorectal cancers) but did so for 13023 genes. The challenge going forward is to understand what is the impact of these mutations on cellular function.
Instead of sequencing to find new polymorphism is also possible to test the association of previously identified variation with disease by high-throughput profiling. Two recent papers focused on profiling known polymorphisms in cancer tissues using either microarrays or PCR plus mass spec.

Underlying all of these efforts is the idea of genetic determinism. That if I sequence my genome I should know how each variation impacts on my health and what treatment I should use to correct it. It begs the question however of much does it really depend on inherited genetic variation ? The often re-visited Nature vs. Nurture debate. The latests MSB paper highlights the impact of the environment on mammalian metabolic functions. Fracois-Pierre J Martin and colleagues have studied how the microbial gut population affects the mouse metabolism. They have used NMR metabolic profiling in conventional mice, and germ free mice colonized by human baby flora to study this question.

Metabolic analysis of liver, plasma, urine and ileal of both types of mice showed a significant change in metabolites in the different compartments associated with the two microbial populations. This is a very clear example of how the environment must be taken into consideration for future efforts of personalized medical care.

This example also underscores the importance of studying the human microbial associations. As Jonathan Eisen discussed in his blog, maybe we should aim at a human microbiome program.

Nature or Nurture ? In either case, abundant streams of data are forthcoming as the sequencing centers crunch away and new omics tools get directed at studying disease. There will be a lot of work to do in order to understand causal relationships and suggest therapeutic strategies. That might be why Google is taking a look at this. They keep saying they want to organize the worlds information, why not health related data.

The picture was taking from News and View by Ian Wilson:
Top-down versus bottom-up—rediscovering physiology via systems biology? Molecular Systems Biology 3:113