Cellular Consequences of Genetic variation: review

Showing posts with label review. Show all posts

Wednesday, February 29, 2012

Book Review - The Filter Bubble

Following my previous post I thought it was on topic to mention a book I read recently called “The Filter Bubble”. The book, authored by Eli Pariser, discusses the several applications of personalization filters in the digital world. As several books I have read in the past couple of years, I found it via a TED talk where the author neatly summarizes the most important points. Even if you are not too interested in technology it is worth watching it. I am usually very optimistic about the impact of technology on our lives but Pariser raises some interesting potential negative consequences of personalization filters.

The main premise of the book is that the digital world is increasingly being presented to us in a personalized way, a filter bubble. Examples include Facebook’s newsfeed and Google search among many others. Because we want to avoid the flood of digital information we willingly give commercially valuable personal information that can be used for filtering (and targeted advertisement). Conversely, the fact that so many people are giving out this information has created data mining opportunities in the most diverse markets. The book goes into many examples of how these datasets have been used by different companies such as dating services and the intelligence community. The author also provides an interesting outlook for how these tracking methods might even find us in the offline world a la Minority Report.

If sifting through the flood of information to find the most interesting content is the positive side of personalization what might be the downside? Eli Pariser tries to argue that this filter “bubble”, that we increasingly find ourselves in, isolates us from other points of view. Since we are typically unaware that our view is being filtered we might get a narrow sense of reality. This would tend to re-enforce our perception and personality. It is obvious that there are huge commercial interests in controlling our sense of reality so keeping these filters in check is going to be increasingly important. This narrowing of reality may also stifle our creativity since so often novel ideas are found at the intersection between different ways of thinking. So, directing our attention to what might be of interest can inadvertently isolate us and make us less creative.

As much as I like content that resonates with my interest I get a lot of satisfaction from finding out new ideas and getting exposed to different ways of thinking. This is way I like the TED talks so much. There are few things better than a novel concept well explained - a spark that triggers a re-evaluation of your sense of the world. Even if these are ideas that I strongly disagree with, as it happens often with politics here in the USA, I want to know about them if a significant proportion of people might think this way. So, even if the current filter systems are not effective to the point of isolating us I think it is worth noting these trends and taking precautions.

The author offers an immediate advice to those creating the filter bubble – let us see and tune your filters. One of the biggest issues he tries to bring up is that the filters are invisible. I know that Google personalizes my search but I have very little knowledge of how and why. The simple act of making these filters more visible should make us see the bubble. Also, if you are designing a filtering system, make it tunable. Sometimes I might want to get out of my comfort zone and see the world from a different lens.

Thursday, July 15, 2010

Review - The Shallows by Nicholas Carr

On a never ending flight from Lisbon back to San Francisco I finished reading the latest book from Nicholas Carr: "The Shallows - What the Internet is doing to our brains". The book is a very extended version of an article Carr wrote a few years ago enteitled "Is Google Making us stupid" that can be read online. If you like that article you will probably find the book interesting as well.

In the book (and article) Carr tries to convince the reader that the internet is reducing our capacity to read deeply. He acknowledges that there is no turning back to a world without the internet and he does not offer any solutions, just the warning. He explains how the internet, as many other communication revolutions (printing press, radio, etc), changes how we perceive the world. In a very material way, it changes our brain as we interact with the web and learn to use it. He argues that the web promotes skimming the surface of every web page and that the constant distractions (email, social networks) are addictive. This addiction can even be explained by an ancient species need to constantly be on the look out for changes in our environment. So, by promoting this natural and addictive shallow intake of information, the internet is pushing aside the hard and deep type of reading that has been one of mankind's greatest achievements.

After reading all of this I should be scared. I easily spend more than ten hours a day on these interwebs and my job as a researcher depends crucially on my capacity to read deeply other scientific works, reason about them, come up with hypothesis, experiments etc. So, why I am still writing this blog post instead of sitting in some corner reading some very large book ? Probably because I do not share Nicholas Carr's pessimist view. I actually agreed with a lot more things that I was expecting to before reading the book. I certainly believe that, like any other tool, the internet changes our brains as we used it. I agree also that reading online promotes this skimming behavior that the book describes. I observe the same from my own experience. What I find hard to believe is that the internet will result in the utter destruction of mankind as we know it (* unless saved by The Doctor).

It is just a personal experience but, despite my addiction to the internet, I haven't stopped reading "deeply". Not only is it a job requirement, I enjoy it. One of my favorite ways to spend saturday mornings is to get something to read and have long breakfast outside. At work I skim through articles and feeds to find what I need and when I do I print to read deeply. That is why I have piles of articles on my desk. This just to say that I found a way around my personal difficulty with deep reading on the computer screen. In other words, if it is required, we will find a way to do it. The internet habits that might be less conducive to deep thought are not worse than many any other addiction of our society and we have learned to cope with those.

I cannot imagine going back to a time when I would need to go to a library and painfully look for every single scientific article I wanted. Not to mention the impossibility of easily re-using other people's data and code. So even if a small but significant number of people can't find a way to cope with the lure of the snippets the advantages still overwhelmingly outnumber the disadvantages.

This topic and book have been covered extensively online. It is almost even evidence in itself that Carr is wrong that such a wealth of interesting and diverse opinions have shown up on the very technological platform that Carr is criticizing in the book (granted that some of these are also newspapers :). Examples:

Mind Over Mass Media (by Steven Pinker)
Carr's reply

Interview with Nick Carr and New York Time's blogger Nick Bilton

and for a different take on the topic here is an interview with Clay Shirky

Tuesday, September 02, 2008

Books: long tails and crowds

I read two interesting books recently that relate to how the internet is changing businesses and society in general.

“The Long Tail” by Chris Anderson ends up suffering from its own success. I was so exposed to the long tail meme before reading the book that there were very few novel ideas left to read. The book describes the business opportunities that come from having a near-unlimited shelf space. While physical stores are forced to focus on the big hits, long tail businesses sell those big hits but also all the other niche products that only a few people will be interested in. There is a big challenge in trying to guide the users to those niche products that they will be interested. Anderson provides examples of recommendation and reputation engines from several companies (ie. Amazon, iTunes, eBay) that by now most of are familiar with. Even for those well exposed to log normal distributions and long tail businesses the book is still worth getting as a resource and for the very interesting historical perspective on the origins of long tail businesses.

“Here Comes Everybody” is an excellent book by Clay Shirky that describes the huge decrease in cost of group formation that we are currently living. Through a series of stories Shirky demonstrates how the internet facilitates group formation and how collective actions that before were impossible are now become the norm. His stories touch on ideas as simple as the photo collections in Flickr to the coordination of regime opposition in Byelorussia. I appreciate the somewhat neutral stance on the phenomena. The book covers cases where online groups almost change to a mob like mentality and others were groups of consumers were able to stand up to corporations to guarantee their rights. The outcome of easy group formation for the future of society is not easy to predict and this is well conveyed in the book.

The subjects and stories from these books are interesting for scientists also because they can influence the way we work. Science is a long tail of knowledge with many niche areas that only a few people in the world care about. The recommendation and reputation engines described could help us navigate the body of knowledge to find those bits that interest us the most. Also, easy group formation might one day shift the way we work so that the innovation and research is not determined by physical location but instead focused on the research problems.

Saturday, June 28, 2008

Capturing biology one model at a time

Mathematical and computational modeling is (I hope) a well accepted requirement in biology. These tools allow us to formalize and study systems of higher complexity that are hard to conceptualize with logic thinking. There have been great advances in our capacity to model different biological systems, from single components to cellular functions and tissues. Many of these efforts have been ongoing separately, each one dealing with a particular layer of abstraction (atoms, interactions, cells, etc) and some of them are now reaching a level of accuracy that rivals some experimental methods. I will try to summarize, in a series of blog posts, the main advances behind some of these models and examples of integration between them with particular emphasis on proteins and cellular networks. I invite others to post about models in their areas of interest to be collected for a review.

From sequence to fold
RNA and proteins once produced adopt structures that have different functional roles. In principle all information required to determine the structure is in the DNA sequence that encodes for the RNA/protein. Although there has been some success in the prediction of RNA structure from sequence ab-initio protein folding remains a difficult challenge (see review by R.Das and D.Baker). A more pragmatic approach has been to use the increasing structural and sequence data made available in public databases to develop sequence based models for protein domains. In this way, for well studied protein folds it is possible to ask the reverse question, what sequences are likely to fold this way.
(To be expanded in a future post, volunteers welcome)

Protein binding models
I am particularly interested in how proteins interact with other components (mainly other proteins and DNA) and in trying to model these interactions from sequence to function. I will leave protein-compound interactions and metabolic networks for more knowledge people.
As mentioned above even without a complete ab-initio folding model, it is possible to predict for some sequences what is their structure or determine to what protein/domain family the sequence belongs from comparative genomics analysis. This by itself might not be very informative from a cellular perspective. We need to know how cellular components interact and hwo these interconnected components create useful functions in a cell.

Docking
Trying to understand and predict how two proteins interact in a complex has been the challenge of structural computational biology for more than two decades . The initial attempt to understand protein-interaction from computational analysis of structural data (what is known today as docking) was published by Wodak and Janin in 1978. In this seminal study, the authors established a computational procedure to reconstitute a protein complex from simplified models of the two interacting proteins. In the twenty-years that have followed the complexity and accuracy of docking methods has steadily increased but still faces difficult hurdles (see reviews Bonvin et al. 2006, Gray, 2006). Docking methods start from the knowledge that two proteins interact and aim at predicting the most likely binding interfaces and conformation of these proteins in a 3D model of the complex. Ultimately, docking approaches might one day also predict new interactions for a protein by exhaustively docking all other proteins in the proteome of the species, but at the moment this is still not feasible.

Interaction types
It should still be possible to use the 3D structures of protein complexes to understand at least particular interactions types. In a recent study, Russel and Aloy have shown that it is possible to transfer structural information on protein-protein interactions by homology to other proteins with identical sequences (Aloy and Russell 2002). In this approach the homologous proteins are aligned to the sequences of the proteins in the 3D complex structure. Mutations in the homologous sequences are evaluated with an empirical potential to determine the likelihood of binding. A similar approach was described soon after by Lu and colleagues and both have been applied on large scale genomic studies (Aloy and Russell 2003 ; Lu et al. 2003). As any other functional annotation by homology this method is limited by how much the target proteins have diverged from the templates. Alloy and Rusell estimated that interaction modeling is reliable above 30% sequence identity (Aloy et al. 2003). Substitutions can also be evaluated with more sophisticated energy potentials after an homology model of the interface under study is created. Examples of tools that can be used to evaluate the impact of mutations on binding propensity include Rosetta and FoldX.
Althougt the methods described above were mostly developed for domain-domain protein interactions similar aproaches have been developed for protein-peptide interactions (see for example McLaughlin et al. 2006) and protein-DNA interactions (see for example Kaplan et al. 2005) .

In summary the accumulation of protein-protein and protein-DNA interaction information along with structures of complexes and the ever increase coverage of sequence space allow us to develop models that describe binding for some domain families. In a future blog post I will try to review the different domain families that are well covered by these binding models.

Previous mini-reviews
Protein sequence evolution