Thursday, March 28, 2013

The glacial pace of innovation in scientific publishing


Nature made available today a collection of articles about the future of publishing. One of these is a comment by Jason Priem on "Scholarship: Beyond the paper". It is beautifully written and inspirational. It is clear that Jason has a finger on the pulse of the scientific publishing world and is passionate about it. He sees a future of a "decoupled" journal, where modular distributed data streams can be built into stories openly and in real time. Where certification and filtering are not tied to the act of publishing and can happen on the fly by aggregating social peer review. While I was reading I could not contain a sigh of frustration. This is a future that several of us like Neil and Greg debated at Nodalpoint many years ago. Almost 7 years ago I wrote in a blog post:

"The data streams would be, as the name suggests, a public view of the data being produced by a group or individual researcher.(...) The manuscripts could be built in wikis by selection of relevant data bits from the streams that fit together to answer an interesting question. This is where I propose that the competition would come in. Only those relevant bits of data that better answer the question would be used. The authors of the manuscript would be all those that contributed data bits or in some other way contributed for the manuscript creation. (...) The rest of the process could go on in public view. Versions of the manuscript deemed stable could be deposited in a pre-print server and comments and peer review would commence."

I hope Jason wont look back some 10 years from now and feel the same sort of frustration I feel now with how little scientific publishing has changed. So what happened in the past 7 years ? Not much really. Nature had an open peer review trial with no success. Publishers were slow to allow comments on their websites and we have been even slower at making use of them. Euan had a fantastic science blog/news aggregator (Postgenomic) but it did not survive long after he went to Nature. Genome Biology and Nature both tried to create pre-print servers for biomed authors but ended up closing them for lack of users. We had a good run at an online discussion forum with Friendfeed (thank you Deepak) before Facebook took the steam out of that platform. For most publishers we can't even know the total number of times an article we wrote has been seen, something that blog authors have taken for granted for many years. Even some cases where progress has been made, it has taken (or is taking) way too long. The most obvious example is the unique author id where after many (oh so many) years there is a viable solution in sight.  All that said, some progress was made in the past few years. Well, mainly two things - PLOS One and Twitter.

Money makes the world go round

PLOS One had a surprising and successful impact in the science publishing world. Its initial stated mission was to change the way peer review was conducted. The importance of a contribution would be judged by how readers would rate or comment on the article. Only it turns out that few people take the time to rate or comment on papers. Nevertheless, thanks to some great management, first by Chris Surridge and then by Peter Binfield, PLOS One was a huge hit as an novel, fast, open access (at a fair price) journal.  PLOS One, catch-all approach saw a steady increase in number of articles published (and very healthy profits) and got the attention of all other publishers.

If open-access is suitable as a business model then funding sources might feel that is OK to mandate immediate open-access. If that were to happen then only publishers with a similar structure to PLOS would survive. So, to make a profit and to hedge against a mandate for open access all other publishers are creating (or buying) a PLOS One clone. This change is happening at an amazing pace. This is great for open access and it goes in the direction of a more streamlined and modular system of publishing. It is not so great for filtering and discoverability. I have said in the past that PLOS One should stop focusing on growth and go back to the initial focus on filtering and the related problem of credit attribution. To their credit they are one of the few very actively advocating for the development of these tools.  Jason, Heather, Euan and others are doing a great job of developing tools that report these metrics.

1% of the news scrolling by 

Of the different tools that scientists could have picked up to be more social Twitter was the last one I would expect to see taking off. 140 characters ?! Seriously ? How geeky is that ? No threaded discussions, no groups, some weird hashtagsomethings. It what world is this picked up by established tenured university professors that don't have time to leave a formal comment on a journal website ? I have no clue how it happened but it did. Maybe the simple interface with a single use case; the asymmetric (i.e. flattering) network structure; the fact that updates don't accumulate like email. Whatever the reason, scientists are flocking to twitter to share articles, discuss academia and science (within the 140 char) and rebel against the Established System. It is not just the young naive students and disgruntled postdocs. Established group leaders are picking up this social media megaphone. Some of them are attracting  audiences that might rival some journals so this alone might make them care less about that official seal of approval from a "high-impact" journal. 

The future of publishing ? 

So after several years of debates about what the web can do for science we have: 1) a growing trend for "bulk" publishing with no solid metrics in place to help us filter and provide credit to authors; and 2) a discussion forum (Twitter) that is clunky for actual discussions but is at least being picked up by a large fraction of scientist. Were are going from here ? I still think that a more open and modular scientific process would be more productive and enjoyable (less scooping). I am just not convinced that scientists in general even care about these things. From my part I am going to continue sharing ideas on this blog and, now that I coordinate a research group, start posting articles to arXiv. I hope that Jason is right and we will all start to take better advantage of the web for science.


Clock image adapted from tinyurl.com/cmy9fn5

Wednesday, December 26, 2012

My Californian Life

Warning: No science ahead
I am in Portugal for the holidays having just left San Francisco. It is a part of the academic life that we have to keep moving around with each new job and after Germany and the US (California) I am moving to the UK in a few days. Its not easy to keep rebuilding your roots in new places but it is certainly rewarding to experience new cultures. It has been great to spend almost 5 years in California and it was very (!) hard to leave. I decided to try to write down a few thoughts about life in the golden state. Maybe it will be useful for others considering moving there. I apologize in advance for the generalizations.


Geek heaven
It is impossible to live in silicon valley (I was in Menlo Park) without noticing the geekiness of the place. Just a few random examples: the Caltrain conductors frequently make geeky jokes ("warp speed to San Francisco"); the billboards on the 101 highway between San Francisco and San Jose often advertise products that only programmers would care about (e.g. types of databases); every time I went for a hot chocolate at the Coupa Cafe there was someone demoing or pitching an idea for a website or app. For someone that likes technology it is a great place to be. It is thrilling to find out that so many of the tech companies you read about are just around the corner. It is also very likely that the people you will meet know about the latest tech trends if they are not, in fact, actually developing them. Unfortunately, every nice thing has its downsides and there is so much money around from tech salaries and companies that everything is horribly expensive if you don't work in the tech sector yourself.

The "can do" attitude and personal responsibility

It is nearly impossible to pin-down what makes silicon valley such a breeding ground for successful companies but one thing that impressed me was their winning attitude. It is more generally an american trait and not just found in California. People often believe their ideas can succeed to a point that borders on naivety. To paint a (somewhat) exaggerated picture: it is not enough to be good, you should strive to be number one. There are many positive and negative consequences of this attitude that are not easy to unwrap. There are many obvious advantages that come with all that drive and positive thinking. This connects also to the notion of personal responsibility -  it should be up to each one of us to make our success. As a negative consequence, failure is then also our individual responsibility, even when in reality it isn't.


I don't want to go into politics but I will say that I have learned a lot also about the role of government. It was interesting to live in a place that emphasized personal responsibility so much more than Portugal. I think it served well to calibrate my own expectations of what the state should and should not be responsible for. As for many other things, I wish more people could have the experience of living in different countries and cultures.

Fitness freaks surrounded by amazing nature and obsessed with organic local food
Before going to the US I had many friends making jokes about how much weight I would gain and the stereotypical view of overweight america. In fact any generalizations of S.F. or silicon valley would have to go in the opposite direction. I had friends waking up at crazy hours to exercise and I even learned to enjoy running (yikes !). It helps that California is sunny and filled with beautiful nature like the state parks and coastline (do the California route 1). Also, the food is great although there is tiny exaggerated obsession with locally grown organic food. The constant sunshine and great food are probably the two things I will miss most when I get to the UK. I might have to buy a sun lamp. The interests for the outdoors was not that different from Germany but it is something I wish was more prevalent in Portugal. Portugal has such a nice weather and outdoors that it is a waste that we don't take better advantage of them. 

A thank-you note
Californians are amazingly friendly people. It is true that sometimes it feels superficial. In restaurants it can even be annoying when a waiter comes for the tenth time to ask if you are really enjoying your meal. Still, it was great to live there with the easy smiles and unexpected chit-chat or compliments. It was easy to feel at home and I never felt like a foreigner. As I have learned from these years of living in California, one should always send a polite thank-you note after an event. So thank you California for these wonderful years. It would be most appropriate to say that it was "awesome". 


Tuesday, November 06, 2012

Scholarly metrics with a heart


I attended last week the PLOS workshop on Article Level Metrics (ALM). As a disclaimer, I am part of  the PLOS ALM advisory Technical Working Group (not sure why :). Alternative article level metrics refer to any set of indicators that might be used to judge the value of a scientific work (or researcher or institution, etc). As a simple example, an article that is read more than average might correlate with scientific interest or popularity of the work. There are many interesting questions around ALMs, starting even with simplest - do we need any metrics ? The only clear observation is that more of the scientific process is captured online and measured so we should at least explore the uses of this information.

Do we need metrics ? What are ALMs good for

As any researcher I dislike the fact that I am often evaluated by the impact factor (IF) of the journals I publish in. When a position has hundreds of applicants it is not practical to read each candidate's research and carefully evaluate them. As a shortcut, the evaluators (wrongly) estimate the quality of a researcher's work by the IFs of the journals. I wont discuss the merit of this practice since even Nature journal has spoken out against the value of IFs. So one of the driving forces behind the development of ALMs is this frustration with the current metrics of evaluation.  If we cannot have a careful peer evaluation of our work then the hope is that we can at least have better metrics that reflect the value/interest/quality of our work. This is really an open research question and as part of the ALMs meeting, PLOS announced a PLOS ONE collection of research articles on ALMs. The collection includes a very useful introduction to ALMs by Jason Priem, Paul Groth and Dario Taraborelli.

Beyond the need for evaluation metrics ALMs should also be more broadly useful to develop filtering tools. A few years ago I noticed that articles that were being bookmarked or mentioned in blog posts had an above average number of citations. This has now being studied in much detail. Even if you are not persuaded by the value of quantitative metrics (number of mentions, PDF downloads, etc) you might be interested instead in referrals from trust-wordy sources. ALM metrics might be useful by tracking the identity of those reading, downloading, bookmarking an article. There are several researchers I follow on social media sites because they mention articles that I consistently find interesting. In relation to identity, I also learned in the meeting that ORCID author ID initiative has finally a (somewhat buggy) website that you can use to claim an ID. Also, ALMs might be useful for filtering if they can be used, along with natural language processing methods, to improve automatic classification of an articles' topic. This last point, on the importance of categorization, was brought up in the meeting by Jevin West who had some very interesting ideas on the topic (e.g. clustering, automatic semantic labeling, tracking ideas over time). If the trend for the growth of mega-journals (PLOS ONE, Scientific Reports, etc) continues, we will need these filtering tools to find the content that matters to us.

Where are we now with ALMs ? 

In order to work with different metrics of impact we need to be able to measure them and these need to made available. From the publishers side PLOS has lead the way in making several metrics available through an API and there is some hope that other publishers will follow PLOS. Nature for example has recently made public a few of the same metrics for 20 of their journals although, as far as I know, they cannot be automatically queried. The availability of this information has allowed for research on the topic (see PLOS ONE collection) and even the creation of several companies/non-profit that develop ALM products (Altmetrics, ImpactStory, Plum Analytics, among others). Other established players have also been in the news recently. For example, the reference management tool Mendeley has recently announced that they have reached 2 million users whose actions can be tracked via their API and Springer announced the acquisition of Mekentosj, the company behind the reference manager Papers. The interest surrounding ALMs is clearly on the rise as publishers, companies and funders try their best to gauge the usefulness of these metrics and position themselves to have an advantage in using them.

The main topics at the PLOS meeting

It was in this context that we got together in San Francisco last week. I enjoyed the meeting format with  a mix of loose topics but strict requirements for deliverables. It was worth attending even just for that and the people I met. After some introductions we got together in groups and quickly jotted down in post-its the sort of questions/problems we though were worth discussing. The post-its were clustered on the walls by commonality and a set of broad problem sets were defined (see the list here).

Problems for discussion included:

  • how do we increase awareness for ALMs ?
  • how to prevent the gaming (i.e. cheating to increase the metrics of my papers) ?
  • what can be and is worth measuring ?
  • how to exchange metrics across providers/users (standards) ?
  • how to give context/meaning/story to the metrics ?

We were then divided into parallel sessions where we further distilled these problems into more specific action lists and very concrete steps that can be taken right now.

Metrics with a heart

From my own subjective view of the meeting it felt like we spent a considerable amount of time discussing how to give more meaning to the metrics. I think it was Ian Mulvany who wrote in the board in one of the sessions: "What does 14 mean ?". The idea of context came up several times and from different view points. We have some understanding of what a citation means and from our own experience we can make some sense of what 10 or 100 citations mean (for different fields etc). We lack a similar sense for any other metric. As far as I know, ImpactStory is the only one trying to give context to the metrics shown by comparing the metrics of your papers with random sets of the same year. Much more can be done along these same lines. We arrived at a similar discussion from the point of view of how we present ourselves as researchers to the rest of the world. Ethan Perlstein talked about how engaging his audience through social media and giving feedback on how his papers were being read and mentioned by others was enough to tell a story that increased interest for his work. The context and story (e.g. who is talking about my work) is more important than the number of views. We reached again to the same sort of discussions when we talked about tracking and using the semantic meaning or identity/source of the metrics. For most use cases of ALMs we can think of we would benefit or downright need more context and this is likely to drive the next developments and research in this area.

The devil we don't know

Heather Piwowar asked me at some point if I had any reservations about ALMs. In particular from the point of view of evaluation (and to a lesser extent filtering) it might turn out that we are substituting a poor evaluation metric (journal impact factor) by an equally poor evaluation criteria - our capacity to project influence online. In this context it is interesting to follow some experiments that are being done in scientific crowdfunding. Ethan Perlstein has one running right now with a very catchy tittle: "Crowdfund my meth lab, yo". Success in crowdfunding should depend mostly on the capacity to project your influence or "brand" online. An exercise in personal marketing. Crowdfunding is an extreme scenario where researchers are trying to side-step the grant system and get funding directly from the wider public. However, I fear that evaluation by ALMs will tend to reward exactly the sort of skills that relate to online branding. Not to say that personal marketing is not important already, this is why researchers network in conferences and get to know editors, but ALMs might reward personal (online) branding to an even higher level.

Thursday, July 19, 2012

I am starting a group at the EMBL-EBI


I signed the contract this week to start a research group at the European Bioinformatics Institute (EBI) in Cambridge, an outstation of the European Molecular Biology Laboratory (EMBL). After blogging my way through a PhD and postdoc it is a pleasure to be able to write this blog post. In January, I will be joining an amazing group of people working in bioinformatics services and basic research where I plan to continue studying the evolution of cellular interaction networks. I am currently interested in the broad problem of understanding how genetic variability gets propagated through molecules and interaction networks to have phenotypic consequences. The two main research lines of the group will continue previous work on the evolution of protein interactions and post-translational modifications (see post) and the evolution of chemical genetics/personal genomics (see post 1 and post 2).  I will continue to use this blog to discuss research ideas and on-going work and as always the content here reflects my own personal views and not of my current/future employers.

I take also the opportunity to mention that I am looking for a postdoc to join the group in January to work on one of the two lines described above.  If you know anyone that might be crazy adventurous interested please send them a link to this post. Past experience (i.e. published work) in computational analysis of cellular interaction networks is required (ex. post-translational modifications, mass-spectrometry, linear motif based interactions, structural analysis of interaction networks, genetic-interactions, chemical-genetics, drug-drug interactions, comparative genomics, etc). The work will be done in collaboration with experimental groups in the EMBL-Heidelberg and the Sanger. Pending approval from EMBL, a formal application announcement will appear in the EMBL jobs page.

I wanted to also share a bit of my experience of trying to get a job after the postdoc “training” period. I have ranted in the past sufficiently about how many problems the academic track system has but the current statistics are informative enough. About 15% of biology related PhDs get an academic position within 5 years. The competition is intense and in the past year and a half I have applied to 15 to 20 positions before taking the EBI job. On a positive note, I had the impression that better established places actually cared less about impact factors. Nevertheless, it has been a super stressful period and even so I know I have been very lucky.  I don’t mean this in any superstitious way but really in how statistically unlikely it is to have supportive supervisors and enough positive research outcomes (i.e. impactful publications) to land a job. I think we are training too many PhDs (or not taking advantage of the talent pool) and the incentives are not really changing. From my part I will try my best to contribute to changing the incentives behind this trend. We could have less funding for PhD in bio related areas, smaller groups and more careers within the lab besides the group leader.  At least, PhDs should be aware and train for alternative science related paths. 

Evolution and Function of Post-translational Modifications

A significant portion of my postodoctoral work is finally out in the last issue of Cell (link to paper). In this study we have tried to assign a function to post-translational modifications (PTMs) that are derived from mass-spectrometry (MS). This follows directly from previous work where we looked at the evolution of phosphorylation in three fungal species (paper, blog post). We (and other groups) have seen that phosphorylation sites diverge rapidly but we don't really know if this divergence of phosphosites results in meaningful functional consequences. In order to address this we need to know the function of post-translational modifications (if they have any). Since these MS studies now routinely report several thousand PTMs per analysis we have a severe bottleneck in the functional analysis of PTMs. These issues are the motivations for this last work. We collected previously published PTMs (close to 200.000) and obtained some novel ubiquitylation sites for S. cerevisiae (in collaboration with Judit Villen's lab).  We revisited the evolutionary analysis and we set up a couple of methods to prioritize those modifications that we think are more likely to be functionally important.
As an example, we have tried to assign function to PTMs by annotation those that likely occur at interface residues. One approach that turned out to be useful was to look for conservation of the modification sites within PFAM domain families. For example, in the figure above and under "Regulation of domain activity", I am depicting a kinase domain. Over 50% of the phosphorylation sites that we find in the kinase domain family occur in the well known activation loop (arrow), suggestion that this is an important regulatory region. We already know that the activation loop is an important regulatory region but we think that this conservation approach will be useful to study the regulation of many other domains. In the article we give other examples and an experimental validation using the HSP70 domain family (in collaboration with the Frydman lab).

I won't describe in detail the work as you can (hopefully) read the paper. Leave a comment or send me an email if you can't and/or if you have any questions regarding the paper or analysis. I also put up the predictions in a database (PTMfunc) for those who want to look at specific proteins. It is still very alpha, I apologize for the bugs and I will try to improve it as quickly as possible. If you want access to the underlying data just ask and I'll send the files. I am also very keen on collaborations with anyone collecting MS data or interested in the post-translational regulation of specific proteins, complexes or domain families.

Blogging and open science
Having a blog means I can give you also some of the thoughts that don't fit in a paper or press release. You can stop reading if you came for the sciency bits. One of the cool things I realized was that I have discussed in this blog three papers in the same research line, that run through my PhD and postdoc. It is fun to be able to go back not just to the papers but to the way I was thinking about these ideas at the time. Unfortunately, although I try to use this blog to promote open science this project was yet-another-failed open science project. Failed in the sense that it started with a blog post and a lot of ambition but never gained any momentum as an online collaboration. Eventually I stopped trying to push it online and as experimental collaborators joined the project I gave up on the open science side of it. I guess I will keep trying whenever if makes sense. This post closes project 1 (P1) but if you are interested in online collaborations have a look at project 2 (P2).

Publishable units and postdoc blues
This work took most of my attention during the past two years and it is probably the longest project I have worked on. Two years is not particularly long but it has certainly made me think about what is an acceptable publishable unit. As I described in the last blog post, this concept is very hard to define. While we probably all agree that a factoid in a tweet is not something I should put on my CV we allow and even cheer for publishing outlets that accept very incremental papers. The work I described above could have easily been sliced into smaller chunks but would it have the same value ? We would have put out the main ideas much faster but it could have been impossible to convince someone to test them. I feel that the combination of the different analysis and experiments has more value as a single story but an incremental approach would have been more transparent. Maybe the ideal situation would be to have the increments online in blogs, wikis and repositories and collect them in stories for publication. Maybe, just maybe, these thoughts are the consequence of postdoc blues. As I was trying to finish and publish this project I was also jumping through the academic track hoops but I will leave that for a separate post.

Wednesday, May 09, 2012

The Minimal Publishable Unit

What constitutes a minimal publishable unit in scientific publishing ? The transition to online publishing and the proliferation of journals is creating a setting where anything can be published. Every week spam emails almost beg us to submit our next research to some journal. Yes, I am looking at you Bentham and Hindawi. At the same time, the idea of a post-publication peer review system also promotes an increase in number of publications. With the success of PLoS ONE and its many clones we are in for another large increase in the rate of scientific publishing. Publish-then-sort as they say.

With all these outlets for publication and the pressure to build up your CV it is normal that researchers try to slice their work into as many publishable units as possible. One very common trend in high-throughout research is to see two to three publications that relate to the same work: the main paper for the dataset and biological findings and 1 or 2 off-shoots that might include a database paper and/or a data analysis methods paper. Besides these quasi duplicated papers there are the real small bites, specially in bioinformatics research. You know, those that you read and you think to yourself that it must have taken no more than a few days to get it done. So what is an acceptable publishable unit ?

I mapped phosphorylation sites to modbase models of S. cerevisiae proteins and just sent this tweet with a small fact about protein phosphosites and surface accessibility:
Should I add that tweet to my CV ? This relationship is expected and probably already published with a smaller dataset but I could bet that it would not take much more to get a paper published. What is stopping us from adding trivial papers to the flood of publications ? I don't have an actual answer to these questions. There are many interesting and insightful "small-bite" research papers that start from a very creative question that can be quickly addressed.  It is also obvious that the amount of time/work spent on a problem is not proportional to the interest and merit of a piece of research. At the same time, it is very clear that the incentives in academia and publishing are currently aligned to increase the rate of publication. This increase is only a problem if we can't cope with it so maybe instead of fighting against these aligned incentives we should be investing heavily in filtering tools.


Wednesday, March 28, 2012

Individual genomics of yeast

Nature Genetics used to be one of my favorite science journals. It consistently had papers that I found exciting. That changed about 5 years ago or so when they had a very clear editorial shift into genome-wide association studies (GWAS). Don't take me wrong, I think GWAS are important and useful but I don't find it very exciting to have lists of regions of DNA that might be associated with a phenotype. I want to understand how variation at the level of DNA gets propagated through structures and interaction networks to cause these differences in phenotype. I mostly stayed out of GWAS since I was focusing on the evolution of post-translational networks using proteomics data but I always felt that this line research was not making full use of what we know already about how a cell works.

In this context, I want to tell you about a paper that came out from Ben Lehner's lab that finally made me excited about individual variation and why I think it is such a great study. I was playing around with the similar idea when the paper came out so I will start with the (very) preliminary work I did and continue with their paper. I hope it can serve as small validation of their approach.

As I just mentioned, I think we can make use of what we know about cell biology to interpret the consequence of genetic variation. Instead of using association studies to map DNA regions that might be linked to a phenotype, we can take a full genome and try to guess what could be deleterious changes and their consequences. It is clear that full genome sequences for individuals are going to be the norm so how do we start to interpret the genetic variations that we see ? For human genetic variation, this is a highly complex and challenging task.

Understanding the consequences of human genetic variation from the DNA to phenotype requires knowledge of how variation will impact on proteins's stability, expression and kinetics; how this in turn changes interaction networks; how this variation is reflected in each tissue function; and ultimately to a fitness difference, disease phenotype or response to drugs. Ultimately we would like to be able to do this but we can start with something simpler. We can take unicellular species (like yeast) and start by understanding cellular phenotypes before we move to more complex species.

To start we need full genome sequences for many different individuals of the same species. For S. cerevisiae we have genome sequences for 38 different isolates by Liti et al. We then need phenotypic differences across these different individuals. For S. cerevisiae there was a great study published June last year by Warringer and colleagues were they tested the growth rate of these isolates under ~200 conditions.  Having these data together we can attempt to predict how the observed mutations might result in the differences in growth. As a first attempt we can look at the non-synonymous coding mutations. For these 38 isolates there are something like 350 thousand non-synonymous coding mutations. We can predict the impact of these mutations on a protein either by analyzing sequence alignments or using structures and statistical potentials. There are advantages and disadvantages to both of the approaches but I think they end up being complementary. The sequence analysis required large alignments while the structural methods require a decent structural model of the protein. I think we will need a mix of both to achieve a good coverage of the proteome.

I started with the sequence approach as it was faster. I aligned 2329 S. cerevisiae proteins with more than 15 orthologs in other fungal species and used MAPP from the Sidow lab at Stanford to calculate how constrained each position is. I got about 50K non-synonymous mutations scored with MAPP of which about 1 to 8 thousand could be called potentially deleterious depending on the cut-off. To these we can add mutations that introduce STOP codons, in particular if they occur early in the protein (~710 of these within the first 50 AAs of proteins).

So up to here we have a way to predict if a mutation is likely to impact negatively on a protein's function and/or stability. How do we go from here to a phenotype like a decrease growth rate under the presence of stress X ? This is exactly the question that chemical-genetic studies try to address. Many labs, including our own,  have used knock-out collections (of lab strains) to measure chemical-genetic interactions that give you a quantitative relative importance of each protein in a given condition. So, we can make the *huge* simplification that we can take all deleterious mutations and just sum up the effects assuming a linear combination of the effects of the knock-outs.

To test this idea I picked 4 conditions (out of the 200 from mentioned above) for which we have chemical-genetic information (from Parsons et al. ) and where there is a high growth rate variation across the 38 strains. With everything together I can test how well we can predict the the measured growth rates under these conditions (relative to a lab strain):
Each entry in the plot represents 1 strain in a given condition. Higher values report worse predicted/experimental growth (relative to a lab strain). There is a highly significant correlation between measured and predicted growth defects (~0.57) overall but cisplain growth differences are not well predicted by these data. Given the many simplifications and poor coverage of some of the methods used I was even surprised to see the correlation at all. This tells us, that at least for some conditions, we can use mutations found in coding regions and appropriately selected gene sets to predict growth differences.

This is exactly the message of the Rob Jelier's paper from Ben Lehner's lab. When they started their work, the phenotypic dataset from Warringer and colleagues was not yet published so they had to generate their own measurements for this study. In addition their study is much more careful in several different ways. For example they only used the sequences for 19 strains that they say have higher coverage and accuracy. They also tried to estimate the impact of indels and they try to increase the size of the alignments (a crucial step in this process) by searching for distant homologs. If you are interested in making use of "personal" genomes you should really read this paper.

Stepping back a bit I think I was excited about this paper because it finally connects the work that has been done in high-throughput characterization of a model organism with the diversity across individuals of that species. It serves as bridge for many people to come to work in this area. There are a large number of immediate questions like how much do we really need to know to make good/better predictions ? What kind of interactions (transcriptional, genetic, conditional genetic) do we need to know to capture most of the variation ? Can we select gene-set and gene weights in other species without the conditional-genetics information (by homogy) ?

As we are constantly told, the deluge of genome sequences will continue so there are plenty of opportunities and data to analyze (I wish I had more time ;). Some recent examples of interest include the sequencing of 162 D. melanogaster lines with associated phenotypic data and the (somewhat narcissistic) personal 'omics study of Michael Snyder. To start to make the jump to human I think it would be great to have cellular phenotypic data (growth rate/survival under different conditions) for the same cells/tissue across a number of human individuals with a sequenced genome. Maybe in a couple of years I wont be as skeptical as I am now about our fortune cookie genomes.


Wednesday, February 29, 2012

Book Review - The Filter Bubble

Following my previous post I thought it was on topic to mention a book I read recently called “The Filter Bubble”. The book, authored by Eli Pariser, discusses the several applications of personalization filters in the digital world. As several books I have read in the past couple of years, I found it via a TED talk where the author neatly summarizes the most important points. Even if you are not too interested in technology it is worth watching it. I am usually very optimistic about the impact of technology on our lives but Pariser raises some interesting potential negative consequences of personalization filters.




The main premise of the book is that the digital world is increasingly being presented to us in a personalized way, a filter bubble. Examples include Facebook’s newsfeed and Google search among many others. Because we want to avoid the flood of digital information we willingly give commercially valuable personal information that can be used for filtering (and targeted advertisement). Conversely, the fact that so many people are giving out this information has created data mining opportunities in the most diverse markets. The book goes into many examples of how these datasets have been used by different companies such as dating services and the intelligence community. The author also provides an interesting outlook for how these tracking methods might even find us in the offline world a la Minority Report.

If sifting through the flood of information to find the most interesting content is the positive side of personalization what might be the downside? Eli Pariser tries to argue that this filter “bubble”, that we increasingly find ourselves in, isolates us from other points of view. Since we are typically unaware that our view is being filtered we might get a narrow sense of reality. This would tend to re-enforce our perception and personality. It is obvious that there are huge commercial interests in controlling our sense of reality so keeping these filters in check is going to be increasingly important. This narrowing of reality may also stifle our creativity since so often novel ideas are found at the intersection between different ways of thinking. So, directing our attention to what might be of interest can inadvertently isolate us and make us less creative. 

As much as I like content that resonates with my interest I get a lot of satisfaction from finding out new ideas and getting exposed to different ways of thinking. This is way I like the TED talks so much. There are few things better than a novel concept well explained - a spark that triggers a re-evaluation of your sense of the world. Even if these are ideas that I strongly disagree with, as it happens often with politics here in the USA, I want to know about them if a significant proportion of people might think this way.  So, even if the current filter systems are not effective to the point of isolating us I think it is worth noting these trends and taking precautions.

The author offers an immediate advice to those creating the filter bubble – let us see and tune your filters. One of the biggest issues he tries to bring up is that the filters are invisible. I know that Google personalizes my search but I have very little knowledge of how and why. The simple act of making these filters more visible should make us see the bubble. Also, if you are designing a filtering system, make it tunable. Sometimes I might want to get out of my comfort zone and see the world from a different lens. 

Thursday, February 23, 2012

Academic value, jobs and PLoS ONE's mission

Becky Ward from the blog "It Takes 30" just posted a thoughtful comment regarding the Elsevier boycott.  I like the fact that she adds some perspective as a former editor contributing to the ongoing discussion. This follows also from a recent blog post from Michael Eisen regarding academic jobs and impact factors. The tittle very much summarizes his position: "The widely held notion that high-impact publications determine who gets academic jobs, grants and tenure is wrong". Eisen is trying to play down the value of the "glamour" high impact factor magazines and fighting for the success of open access journal. It should be a no-brainer really. Scientific studies are mostly payed for by public money, they are evaluated by unpaid peers and published/read online. There is really no reason why scientific publishing should be behind pay-walls.

Obviously it is never as simple as it might appear at first glance. If putting science online was the only role publishers played I could just put all my work up on this blog. While I write up some results as blog posts I can guarantee you that I would soon be out of job if I only did that. So there must be other roles that scientific publishing plays and even if these roles might be outdated or performed poorly they are needed and must be replaced for us to have a real change in scientific publishing.

The value of scientific publishing

In my view there are 3 main roles that scientific journals are currently playing: filtering, publishing and providing credit. The act of publishing itself is very straightforward and these days could easily cost near zero if the publishers have access to the appropriate software. If publishing itself has benefited greatly with the shift online, filtering and credit are becoming increasingly complex in the online world.

Filtering
Moving to the digital world created a great attention crash that we are still trying to solve. What great scientific advances happened last year in my field ? What about in unrelated fields that I cannot evaluate myself ?  I often hear that we should be able to read the literature and come up with answers to these questions directly without regard to where the papers where published. However, try to just imagine for a second that there were no journals. If PLoS ONE and its clones get what they are aiming for, this might be on the way. A quick check on Pubmed tells me that 87134 abstracts were made available in the past 30 days. That is something like 2900 abstracts per day ! Which ones of these are relevant for me ? The currently filtering system of tiered journals with increasing rejection rates is flawed but I think it is clear that we cannot do away with it until we have another in place.

Credit attribution
The attribution of credit is also intimately linked to the filtering process. Instead of asking about individual articles or research ideas credit is about giving value to researchers, departments or universities. The current system is flawed because it overvalues the impact/prestige of the journals where the research gets published. Michael Eisen claims that impact factors are not taken into account when researchers are picked for group leader positions but honestly this idea does not ring true to me. From my personal experience of applying for PI positions (more on that later), those that I see getting shortlisted for interviews tend to have papers in high-impact journals. On twitter Eisen replied to this comment by saying "you assume interview are because of papers, whereas i assume they got papers & interviews because work is excellent". So either high impact factor journals are being incorrectly used to evaluate candidates or they are working well to filter excellent work. In either case, if we are to replace the current credit attribution system we need some other system in place.

Article level metrics
So how do we do away with the current focus on impact factors for both filtering and credit attribution? Both of those could be solved if we could focus on evaluating articles instead of the journals. The mission of PLoS ONE was exactly to develop article level metrics that would allow for a post-publication evaluation system. As they claim in their webpage they want "to provide new, meaningful and efficient mechanisms for research assessment". To their credit PLoS has been promoting the idea and making some article level indicator easily accessible but I have yet to see a concrete plan to provide the readers with a filtering/recommendation tool. As much as I love PLoS and try to publish in their journals as much as possible, in this regard PLoS ONE has so far been a failure. If PLoS and other open access publishers want to fight Elsevier and promote open access they have to invest heavily in filtering/recommendation engines. Partner with academic groups and private companies with similar goals (ex. Mendeley ?) if need be. With PLoS ONE they are contributing to the attention crash and making (finally) a profit off of it. It is time to change your tune, stop saying how big PLoS ONE is going to be next year and start staying how you are going to get back on track with your mission of post-publication filtering.  

Summary
Without replacing the current filtering and credit attribution roles of traditional journals we wont do away with the need for tiered structure in scientific publishing. We could still have open access tiered systems but the current trend for open access journals appears to be the creation of large journals focused on the idea of post-publication peer review since this is economically viable. However, without filtering systems, PLoS ONE and its many clones can only contribute to the attention crash problem and do not solve the issue of credit attribution. PLoS ONE's mission demands it that they work on filtering/recommendation and I hope that if nothing else they can focus their message, marketing efforts and partnerships on this problem.




 



Wednesday, February 22, 2012

The 2012 Bioinformatics Survey

I am interrupting my current blogging hiatus to point to a great initiative by Michael Barton. He is collecting some information regarding those working in the fields of bioinformatics / computational biology in this survey. This is a repeat from a similar analysis done in 2008 and I think is it is really worth getting a felling for how things have been changing. We can all benefit from the end result. So far, after 2 weeks, there have been close to 400 entries to the survey but the rate of new entries is slowing down. So, if you have not done so already, go and fill it out or bug some colleague to do so. 

Wednesday, May 25, 2011

Predicting kinase specificity from phosphorylation data

Over the past few years, improvements in mass-spectrometry methods have resulted in a big increase in throughput for the identification of post-translational modifications (PTMs). It is even hard to keep up with all the phosphoproteomics papers and the accumulation of phosphorylation data. Most often, improvements in methods result in interesting challenges and opportunities. In this case, how can we make use of this explosion in PTM data ? I will try to explore a fairly straightforward idea, on how to use phosphorylation data to predict kinase substrate specificity. I'll describe here the general idea and just the first stab at it to show that I think it can work.

The inspiration for this is the work by Neduva and colleagues that have show that we can search for enriched motifs within proteins that interact with the domain of interest. For example, we can take a protein containing and SH3 domain, find all of it's interaction partners and you will likely see that they are enriched for proline rich motifs of the type PXXP (x = any amino-acid) that is the known binding preference for this domain. So the very obvious application to kinases would be to take the interaction partners of a kinase and find enriched peptide motifs. The advantage of looking at kinases, over any other type of peptide binding domains, is that we can focus specifically on phosphosites.

As a test case I picked the S.cerevisiae Cdc28p (Cdk1) that is known to phosphorylate the motif  [ST]PXK. I used the STRING database to identify proteins that functionally interact with Cdc28 with a cut-off of 0.9 and retrieved all currently known phosphosites within these proteins. As a quick check I used Motif-X to search for enriched motifs.  The first try was somewhat disappointing but after removing phosphosites that had less than 5 MS spectra and/or experiments supporting it I got back the this logo as the most enriched motif:

This was probably the easiest kinase to try since it is known that it typically phosphorylates its targets at multiple sites and it heavily studied.  Still, I think there is a lot of room for exploration here. If anyone is interested in collaborating on this let me know. If your doing computational work I would be interested in some code/tools for motif enrichment. If your doing experimental work let me know about your favorite kinases/species. 

Thursday, April 28, 2011

In defense of 'Omics

High-throughput studies tend to have a bad reputation. They are often derided as little more than fishing expeditions. Few have summarized these feelings as sharply as Sydney Brenner:
"So we now have a culture which is based on everything must be high-throughput.I like to call it low-input, high-throughput, no-output biology"
Having dealt with these type of data for so long, I am often in the strange position of having to defend the approaches. As I was in a real need to procrastinate, I decide to try to write some of these thoughts down.

Error rates
One of the biggest complaints directed at large-scale methods is that they have very high error rates. Usually these complaints come from scientists interested in studying system X or protein Y, that dig into these datasets only to find out that their protein of interest is missing. Are the error rates high ? While this might be true for some methods it is important to note that the error rates are almost always quantified and that those developing the methods keep pushing the rates down.

When thinking about 'small-scale' studies I could equally ask - why should I trust a single western blot image ? How many westerns were put in the garbage bin before you got that really nice one that is featured in the paper ? In fact, some methods for reducing the error become only feasible when operating in high-throughout. As an example, when conducting pull-down experiments to determine protein-protein interactions, unspecific binding becomes much easier to call. This has lead to the development of analysis tools that cannot be employed on single pull down experiments.

So, by quantifying the error rates and driving these down via experimental or analysis improvements, 'omics research is in fact, on the forefront of data quality. At the very least, you know what the error rate is and can use the information accordingly. Once the methods are improved to an extent that the errors are negligible or manageable they are quietly no longer consider "omics". The best example of this I think is genome sequencing. Even with the current issues with next-gen sequencing, few put 'traditional' genome sequencing in the same bag with the other 'omics tools, although they have quantifiable errors.

Standardization
Related to error quantification is standardization. To put is simply, large-scale data is typically deposited in databases and is available for re-use. What is the point of having really careful experiments if they will only be available for re-use, in any significant way, when a (potentially sloppy) curator digs the info out of papers ? This availability fuels research by others that are not set-up to perform the measurements. This is one of the reasons why bioinformatics thrives. The limitations become the ideas not the experimental observations/measurements. Anyone can sit down, think of a problem and with some luck the required measurements (or proxy of them) have been made by others for some unrelated purpose. This is why publications of large-scale studies are so highly cited, they are re-used over and over again.

Engineering mindset and costs
One other very common complaint about these methods is cost. It is common to feel that 'omics research is 'trendy', expensive and consumes too much of the science budgets. While the part about budget allocation might be true, the issue with costs is most certainly not. Large-scale methods are developed by people with an engineering mindset. The problems in this type of research are typically on how to make the methods work effectively, which includes making them cheaper, smaller, faster, etc. 'Omics research drives costs down.

Cataloging diversity
Besides these technical comments the highest barrier to deal with, when discussing these methods with others is a conceptual one.  Is there such a thing as 'hypothesis free' research ? To address this point let me go off on a small tangent. I am currently reading a neuroscience book - Beyond Boundaries - by Miguel Nicolelis, a researcher at Duke University.  I will leave a proper review for some later post but, at some point, Nicolelis talks about the work of Santiago Ramon y Cajal. Ramon y Cajal is usually referred to as the father of the neuron theory that postulates that the nervous systems is made up of fundamental discrete units (neurons).  His drawings of neuronal circuits of different species are famous and easily recognizable. The amazing level of detail and effort that he put into these drawings really underscores his devotion for cataloging diversity. These observations inspired a revolution in neuroscience, much the same way Darwin's catalogs of diversity impacted biology. Should we not build catalogs of protein-interactions, gene-expression, post-translational modifications, etc ? I would argue that we must. Omics research drives errors and price down, creates catalogs of easily accessible and re-usable observations that fuels research. I actually think that it frees researchers. While a few specialize in method developments others are free to dream up biological problems to solve with the data gathering effort shortened to a digital query.

Miss-understandings
So why the negative connotations ? Part of it is simple backlash against the hype. As we know, most technologies tend to follow a hype cycle where early exaggerated excitement is usually followed by disappointment and backlash when they fail to deliver. A second important aspect is simply a lack of understanding of how to make use of the available data. This model of data generation separated from the problem solving and analysis only makes sense if researchers can query the repositories and integrate the data into their research. It is sad to note that this capacity is far from universal. While new generations are likely to bring with them a different mindset, those developing the large scale methods should also bear the responsibility of improving the re-usability of the data. 

Thursday, March 03, 2011

Structure based prediction of kinase interactions

About a year ago Ben Turk's lab published a large scale experimental effort to determine the substrate recognition preferences of most yeast kinases (Mok et al. Sci. Signal. 2010). They used a peptide screening approach to analyze 61 of about 122 known S. cerevisiae kinases in order to derive, for each one, a position specific scoring matrix (PSSM) describing their substrate recognition preference. In the figure below I show an example for the Hog1 MAPK where it is clear that this kinase prefers to phosphorylate peptides that have proline next to the S/T that is going to be phosphorylated.

Figure 1 - Example of Hog1 substrate recognition preference derive from peptide screens. Each spot in the array contains a mixture of peptides that are randomized at all positions except at marked position (-5 to +4 relative to the phosphorylatable residue).  Strong signal correlates with a preference for phosphorylating peptides containing that amino-acid at the fixed position.

As was previously known, most kinases don't appear to have very striking substrate binding preferences. Still, these matrices should allow for significant predictions of kinase-site interactions. These matrices should allow us also to benchmark previous efforts by Neil and other members of the Kobe lab on the structural based predictions of kinase substrate recognition. For this, I obtained the predicted substrate recognition matrices from the Predikin server and known kinase-site interactions from the PhosphoGrid database. I used this data to compare the predictive power of the experimentally determined kinase matrices (Mok et al.) with the predicted matrices from Predikin. This analysis was done about a year ago when the Mok et al. paper was published but I don't think Phosphogrid was significantly updated since then.

Phosphogrid had 422 kinase-site interactions for the 61 kinases analyzed in Mok et al. of which ~50% of these have in-vivo evidence for kinase recognition. As expected, the known kinase-site interactions have a stronger experimental matrix score than random kinase-site assignments (Fig 2).

Figure 2 - The set of kinase-site interactions used broken down according the kinases with higher representation. These sites were scored using the experimental matrices along with other randomly selected phosphosites and the scores of both populations are summarized in the boxplots.


A random set of kinase-phosphosite interactions of equal size was used to quantify the predictive power of the experimental and the Predikin matrices with a ROC curve (Fig 3).
Figure 3 - Area under the ROC curve values for kinase-site predictions using both types of matrices.

Overall, the accuracy of the predicted matrices from Predikin matched reasonably well with those derived from the peptide array experiments with only a small difference in AROC values. I broke down the predictions for individual kinases with at least 10 sites known. Benchmarking of such low numbers becomes very unreliable but besides the Cka1 kinase, the performance of the Predikin matrices matched reasonably well the experimental results.

I am assuming here that Predikin was not updated with any information from the Mok et al study to derive their predictions. If this is true it would mean that structural based prediction of kinase recognition preferences, as implemented in Predikin, is almost as accurate as preferences derived from peptide library approaches. 

Friday, January 07, 2011

Why would you publish in Scientific Reports ?

The Nature Publishing Group (NPG) is launching a fully open access journal called Scientific Reports. Like the recently launched Nature Communications, this journal is online only and the authors cover (or can choose to cover for Nat Comm) the cost of publishing the articles in an open access format. Where 'Scientific Reports' differs most is that the journal will not reject papers based on their perceived impact. From their FAQ:
"Scientific Reports publishes original articles on the basis that they are technically sound, and papers are peer reviewed on this criterion alone. The importance of an article is determined by its readership after publication."

If that sounds familiar it should. This idea of post-publication peer reviewing was introduced by PLoS ONE and Nature appears to be essentially copying the format from this successful PLoS journal. Even the reviewing practices are the same whereby the academic editors can choose to accept/reject based on their opinion or consult external peer reviews. In fact, if I was working at PLoS I would have walked into work today with a bottle of champagne and I would have celebrated. As they say, imitation is the sincerest form of flattery. NPG is increasing their portfolio of open access or open choice journals and  hopefully they will start working on article level metrics. In all, this is a victory for the open-access movement and to science as a whole.

As I had mentioned in a previous post, PLoS has shown that one way to sustain the costs of open access journals with high rejection rates a publishers needs also to publish higher volume journals. Both BioMedCentral and more recently PLoS have also shown that high-volume open access publishing can be profitable so Nature is now trying to get the best of both worlds. Brand power from high-rejection rate journals with a subscription model and a nice added income with a higher-volume open access journals. If by some chance, founders force a complete move to immediate open access, NPG will have a leg to stand on.

So why would you publish in Scientific Reports ? Seriously, can someone tell me ? Since the journal will not filter on perceived impact, they wont be playing the impact factor game. They did not go as far as naming it Nature X so brand power will not be that high. It is similarly priced (until January 2012) as PLoS ONE and has less author feedback information (i.e. article metrics). I really don't see any compelling reason why I would choose to send a paper to Scientific Reports over PLoS ONE.

Updated June 2013 - Many of you reach this page searching for the impact factor of Scientific Reports. It is now out and it is ~3. Yes, it is lower than PLOS ONE's so you have yet another reason not to publish there. 

Friday, December 31, 2010

End of the year with chemogenomics

Taken from jurvetson at:
www.flickr.com/photos/jurvetson/3156246099/
Around this time of the year it is customary to make an assessment of the year that is ending and to make a mental list of things we wish for in the year ahead. Here is my personal (but work related :) take on this tradition.

My academic year ended with the publication of two works related to chemogenomics. Chemogenomics or chemical genomics tries to study the genome-wide response to a compound. Usually, collections of knock-outs or over-expression of large number of genes are grown in the presence or absence of a small molecule to assess the fitness cost (or advantage) of that perturbation to the drug response. This is what was done in these two works.

In the first one, Laura Kapitzky (a former postdoc colleague in the lab) used a collection of KO strains both in S. cerevisiae and S. pombe to essay for the growth in the presence of different compounds. The objective was to study the evolution of the drug response in these distantly related fungi. In line with what was previously observed in the lab for genetic-interactions and kinase-substrate interactions we found that drug-gene functional interactions were poorly correlated across these two species. Perhaps one interesting highlight from this project was that we could combine data from both fungi to improve the prediction of the mode-of-action of the compounds.

The second project, in which I was only minimally involved in, was a similar chemogenomic screen but at a much larger scale. As the tittle implies "Phenotypic Landscape of a Bacterial Cell" (behind paywall), is a very comprehensive study of the response of the E.coli whole knock-out library against an array of compounds and conditions. Robert, Athanasios and other members of the Carol Gross lab did an amazing job of creating this resource and picking some of the first gems from it.

Something that I wanted to highlight here was not so much what was discovered but what I was left wanting. These sort of growth measurements tell us a lot about drug-gene relationships. We also have a growing knowledge of how genes genetically interact either by similar growth measurements in double-mutants or by predictions (as in STRING). These should allow us then to make prediction about how drugs interact. If two drugs can act in synergy to decrease the growth of a bug we should be able to rationalize that in terms of drug-gene and gene-gene interactions. I find this is a very interesting area of research. Naively these sort of data should allow us to predict drug combinations that target a specific species (i.e. pathogen) or diseased tissue but not the host or the healthy tissue. Here is a scientific wish for 2011, that these and other related datasets will give us a handle on this interesting problem.

As for the future, I am entering the final year of my current funding source (thank you HFSP) so my attention is turning into finding either some more funds or another job. I will continue working on the evolution of signalling systems, in particular trying to find the function of post-translational modifications (aka P1). Unfortunately the project failed as an open science initiative, something that I have mostly given up for now. I think the main reason it didn't work was because of lack of collaborators of similar (open) interests and non-overlapping skill sets as Greg and Neil were discussing in the Nodalpoint podcast a while ago.

See you all in 2011 !

Tuesday, December 21, 2010

The GABBA program

I was recently in the annual meeting of my former PhD program, the GABBA program, a Graduate Program in Areas of Basic and Applied Biology in Portugal. I realized that I never blogged about the Portuguese PhD programs and I thought I would share with you their somewhat unusual concept.

Like in other PhD programs, GABBA students start by having courses during the first semester of the program. The semester is divided into week long courses in different subjects (think Cell-Cycle, Development, etc) with invited teachers. What is different from most other programs I know of is that students then get to use their scholarship to do their research projects anywhere in the world. GABBA students get payed to do their research in any lab that accepts them, no strings attached. No return clause, not even a requirement to inform the program of research progress. There is an annual meeting where students (and alumni) get to go to Portugal to present their work but no one is obliged to go. It is also a nice opportunity to exchange tips and in some cases even start collaborations.

The annual meeting is always organized around Christmas time so most people end up going. I kept going to the meetings after finishing my PhD mostly because I enjoy seeing the people but also because of the cool science. As you can imagine, everyone is scattered around the world in very nice labs doing research in all sort of different biomedical related subjects. This year there were a lot of talks about stem cells and an unusually high number of neurobiology related work. Some cool research of note for me were for example the work of Martina Bradic (Borowsky lab at NYU) about the convergent evolution of blind cave fish and the talk by Andre Sousa (Sestan Lab at Yale) on the transcriptional profiling of human brain regions during development (http://hbatlas.org/).

The GABBA program takes international students as well but they are typically asked to do their research in Portugal. The applications are usually around June so keep an eye out if you are interested in applying. Have a look at the admissions page for more information.

Wednesday, November 24, 2010

This holiday season, make them spit in a tube

Black Friday is upon us and everyone here in the US is going consumer crazy. Along with the traditional discounts in the offline world, there are also tempting promotions in many online stores. One great example is the discount that 23andMe is offering until next Friday. If you have not heard about 23andMe, they are a direct-to-consumer genetics company that sell a SNP profiling service. You get to find out about your ancestry and genetic propensity for traits and some diseases. The analysis usually costs $499 (plus a one year $5 monthly mandatory subscription) but they are having a $400 dollar discount (use promo code UA3XJH). What better way to spend Christmas than having everyone spit into a little tube.