(caution, fiction ahead)
I wake up in the middle of the night startled by some noise. Pulse racing I try to focus my attention outwards. Something breaking, glass shattering? Is someone out there ? I reach out with my senses but an awkward feeling nags at me, bubbling up to my consciousness. I try hard to focus, it is coming from outside the room , someone is inside my house. I close my eyes but vertigo takes over and weightlessness empowers me. I am in the living room cleaning the floor, picking up a broken glass. The nagging feeling finally assaults me fully. I am moving but I am not in control. Panic rises quickly as I watch helpless the simple and quiet actions of someone else. I stop picking up glass and I feel curious, only it is not exactly me, the feeling is there besides me.
- Hi, who are you ?
The voice catches me by surprise and my fear goes beyond rational control. All I can think of is to escape. to go away from here. For a second time I find myself floating as if searching for a way out. When I open my eyes again I am by the beach and I breath a sigh of relief. The constant sound of the waves calms me down for a few seconds until my eyes start drifting to the side. No, stay there I am in control! I look into the eyes of a total stranger that smiles back at me in recognition. Two voices ask me if I am enjoying the view and I can only scream back in confusion.
I wake up in the middle of the night startled by some noise. I immediately flex my hands in front of my eyes to make sure it was nothing but a nightmare trying hard to calm down. What a dream. I get up and check on the noise coming from the living room realizing that it was just the storm outside. Feeling better I fire up my laptop and grab a glass of water from the kitchen. I open twitter and type away:
- I had the strangest dream !(cursor blinking) Our senses were all connected(enter)
I get up to open the window drinking another sip of water. After a couple of steps I feel a jabbing headache forcing me to stop and bright spots of light blur my vision. I close my eyes in pain and the voices of some unseen crowd thunder in my ears:
- I had the same dream - the all say in unison
The sound of glass shattering on the floor in the last thing I remember before collapsing.
I wake up in the middle of the night startled by some noise (...)
(Twistori was the main motivation for this post)
Previous fiction:
The Fortune Cookie Genome
Thursday, June 12, 2008
Tuesday, June 10, 2008
Why does FriendFeed work ?
I have been using FriendFeed for a while and I have to say that it works surprisingly well. It is hard to define what FriendFeed is so the only real way of understanding it is to try it for a while.
One common way to define FF would be as a life-stream aggregator. Each user defines a set of feeds (blog, Flickr, Twitter, bookmarks, comments, etc) providing all other users with a single view of all the online activities of that user. Anyone can select how much to share (even nothing at all) and subscribe to a number of other users. Each item (photo, blog post, bookmark) can serve then as spark for discussions. The users can mark items as interesting or comment on them and this propagates to all other people that subscribe to you. In addition we can select sources to hide if for some reason there is a particular part of a user's activities you don't enjoy. All of this creates a very personalized view of whoever you elect to interact with online.
I still find it striking that there are so many long threads of discussions around items that we share in FriendFeed, sometimes more than in the original site. A couple of examples:
Google code as a science repository (discussion in FF, blog post)
Into the Wonderful (discussion in FF, slideshare site)
Bursty work (discussion in FF, blog post)
Why does it work so well ? One possible reason could be that a group of early adopter scientists happened to get together around this website creating the required critical mass to start the discussions. Still, most of those commenting were already participating on blogs so that might not be it. There might be something about the interface, maybe it is the ease of adding comments and that these comments can be edited that increases the participation. Ongoing discussions get bumped higher in the view so every new comment brings the item back to your attention. In this way you know who saw the item and who is thinking about it. A bit like talking about a movie you saw or a book you read with a bunch of friends.
Anyone interested in the science aspects of it should check out the Life Scientists room with currently around 85 subscribers. Here is an introduction to some of these people, in particular on what they work on. Connecting to other scientists in this way lets you see what are the articles they find interesting and discuss current scientific news. Even maybe start a couple of side-projects for the fun of it.
One common way to define FF would be as a life-stream aggregator. Each user defines a set of feeds (blog, Flickr, Twitter, bookmarks, comments, etc) providing all other users with a single view of all the online activities of that user. Anyone can select how much to share (even nothing at all) and subscribe to a number of other users. Each item (photo, blog post, bookmark) can serve then as spark for discussions. The users can mark items as interesting or comment on them and this propagates to all other people that subscribe to you. In addition we can select sources to hide if for some reason there is a particular part of a user's activities you don't enjoy. All of this creates a very personalized view of whoever you elect to interact with online.
I still find it striking that there are so many long threads of discussions around items that we share in FriendFeed, sometimes more than in the original site. A couple of examples:
Google code as a science repository (discussion in FF, blog post)
Into the Wonderful (discussion in FF, slideshare site)
Bursty work (discussion in FF, blog post)
Why does it work so well ? One possible reason could be that a group of early adopter scientists happened to get together around this website creating the required critical mass to start the discussions. Still, most of those commenting were already participating on blogs so that might not be it. There might be something about the interface, maybe it is the ease of adding comments and that these comments can be edited that increases the participation. Ongoing discussions get bumped higher in the view so every new comment brings the item back to your attention. In this way you know who saw the item and who is thinking about it. A bit like talking about a movie you saw or a book you read with a bunch of friends.
Anyone interested in the science aspects of it should check out the Life Scientists room with currently around 85 subscribers. Here is an introduction to some of these people, in particular on what they work on. Connecting to other scientists in this way lets you see what are the articles they find interesting and discuss current scientific news. Even maybe start a couple of side-projects for the fun of it.
Monday, June 09, 2008
Evaluation metrics and Pubmed Faceoff
I have been reading recently a lot about evaluation metrics for papers and authors. It started with a blog post in Action Potential (Nature Neuroscience's blog) showing a correlation between the number of downloads of a paper and its citations. From the comments in that blog post I found out about a forum in Nature Network about Citation in Science and also the recently published group of perspectives on "The use and misuse of bibliometric indices in evaluating scholarly performance".
It could have been a coincidence but Pierre sparked a long discussion in FriendFeed when he suggested it would be nice to be able to sort Pubmed queries by the imapact factor of the journal. In reaction to this Euan set up a very creative interface to Pubmed that he named Pubmed Faceoff. He took several different factors into account (citations from Scopus, eigenfactor of the journal, the time the paper was published) and for each paper returned from a Pubmed query creates a face that describes the paper. The idea for the visualization is based on Chernoff Faces. It is really a creative idea and I wish Pubmed could spend more resources in coming up with alternative interfaces like this, something like a "labs" section where they could play with ideas or allow others to create interfaces that they would host.
I wont go here into the whole debate about the evaluation metrics since there is already a lot of discussion going on in some of those links I mentioned.
It could have been a coincidence but Pierre sparked a long discussion in FriendFeed when he suggested it would be nice to be able to sort Pubmed queries by the imapact factor of the journal. In reaction to this Euan set up a very creative interface to Pubmed that he named Pubmed Faceoff. He took several different factors into account (citations from Scopus, eigenfactor of the journal, the time the paper was published) and for each paper returned from a Pubmed query creates a face that describes the paper. The idea for the visualization is based on Chernoff Faces. It is really a creative idea and I wish Pubmed could spend more resources in coming up with alternative interfaces like this, something like a "labs" section where they could play with ideas or allow others to create interfaces that they would host.I wont go here into the whole debate about the evaluation metrics since there is already a lot of discussion going on in some of those links I mentioned.
Wednesday, May 14, 2008
Prediction of phospho-proteins from sequence
I want to be able to predict what proteins in a proteome are more likely to be regulated by phosphorylation and hopefully use mostly sequence information. This post is a quick note to show what I have tried and maybe get some feedback from people that might have tried this before.
The most straightforward way to predict the phospho-proteins is to use existing phospho-site predictors in some way. I have used the GPS 2.0 predictor on the S. cerevisiea proteome with medium cutoff and including only Serine/Threonine kinases. The fraction of tyrosine phosphosites in S. cerevisiae is very low so I decided to for now not try to predict tyrosine phosphorylation.
This produces a ranked list of 4E6 putative phosphosites for the roughly 6000 proteins scored according to the predictor (each site is scored for multiple kinases). My question is how to best make use of these predictions if I mostly want to know what proteins are phosphorylated and not the exact sites. Using a set of known phosphorylated proteins in S. cerevisiae (mostly taken from expasy) I computed different final scores as a function of the of all phospho-site scores:
1) the sum
2) the highest value
3) the average
4) the sum of putative scores if they were above a threshold (4,6,10)
5) the sum of putative phosphosite scores if they were outside ordered protein segments as defined by a secondary structure predictor and above a score threshold
The results are summarized with the area under the ROC curve (known phosphoproteins were considered positives and all other negatives) :

In summary, the sum of all phospho-site scores is the best way that I found so far to predict what proteins are phospho-regulated. My interpretation is that phospho-regulated proteins tend to be multi-phosphorylated and/or regulated by multiple kinases so the maximum site score will not work as well as the sum. As a side note, although there are abundance biases in mass-spec data (the source of most of the phospho-data) protein abundance is a very poor predictor of phospho-regulation (AROC=0.55).
Disregarding putative sites outside predicted secondary structured protein segments did not improve the predictions as I would expect but I should try a few disorder predictors.
Ideas for improvements are welcomed, in particular sequence based methods. I would also like to avoid comparative genomics for now.
The most straightforward way to predict the phospho-proteins is to use existing phospho-site predictors in some way. I have used the GPS 2.0 predictor on the S. cerevisiea proteome with medium cutoff and including only Serine/Threonine kinases. The fraction of tyrosine phosphosites in S. cerevisiae is very low so I decided to for now not try to predict tyrosine phosphorylation.
This produces a ranked list of 4E6 putative phosphosites for the roughly 6000 proteins scored according to the predictor (each site is scored for multiple kinases). My question is how to best make use of these predictions if I mostly want to know what proteins are phosphorylated and not the exact sites. Using a set of known phosphorylated proteins in S. cerevisiae (mostly taken from expasy) I computed different final scores as a function of the of all phospho-site scores:
1) the sum
2) the highest value
3) the average
4) the sum of putative scores if they were above a threshold (4,6,10)
5) the sum of putative phosphosite scores if they were outside ordered protein segments as defined by a secondary structure predictor and above a score threshold
The results are summarized with the area under the ROC curve (known phosphoproteins were considered positives and all other negatives) :
In summary, the sum of all phospho-site scores is the best way that I found so far to predict what proteins are phospho-regulated. My interpretation is that phospho-regulated proteins tend to be multi-phosphorylated and/or regulated by multiple kinases so the maximum site score will not work as well as the sum. As a side note, although there are abundance biases in mass-spec data (the source of most of the phospho-data) protein abundance is a very poor predictor of phospho-regulation (AROC=0.55).
Disregarding putative sites outside predicted secondary structured protein segments did not improve the predictions as I would expect but I should try a few disorder predictors.
Ideas for improvements are welcomed, in particular sequence based methods. I would also like to avoid comparative genomics for now.
Wednesday, May 07, 2008
Drug-drug interactions and network connectivity
How does the effect of drug-drug combinations relate to the cellular interactions of their targets ? Last year, Joseph Lehár and colleagues published a paper in MSB looking into this question.
One way to study the effect of drug combinations on growth of a bacteria for example is to measure the inhibition of growth of all possible combinations of serially diluted doses of two combined drugs and plotting dose-matrices like the ones shown in figure 1 of the paper and shown here adapted from the paper. In fig1A the authors show how the combined effect of increasing doses of two drugs inhibit the growth of a methicillin-resistant Staphylococcus aureus strain. Light colors are equivalent to a strong inhibition of drug. One observation from this figure is that the two drugs can inhibit the growth of this strain in an additive fashion. The question the authors tried to address in this paper is how much does this sort of dose-matrix inform us about the possible interactions of the targets. The drugs could be interacting with the same target, different targets in the same pathway/complex, targets in different pathways both required for growth, etc.
In order to study this they first simulated an abstract metabolic network (using ODEs, see model file in Sup) with two different pathways required for growth, with branched and linear blocks and one negative feedback (see Fig3 in the paper). They simulated the effect of increasing drugs in their models by decreasing the enzyme activities of the simulated targets. For each possible drug-drug combination they then calculated the predicted dose-matrix effect on growth (pathway output). The observed that by fitting the obtained dose-matrices to 4 types of classical dose-matrix models (described in Fig2) they could predict where in this network the two targets would more likely be.
As an example , two sequential targets in an unbranched section of the network embedded in an negative feedback produces a dose-matrix that best fits a potentiation model (shown here, adapted from Fig3).
Having established by simulations that there is information on the drug-matrices that relate to the interaction of their targets they then tested the effect of 10 known antifungal drugs on the sterol pathway (also well established) of Candida glabrata. For each drug-drug combination they tried to fit the experimental dose matrices to the same 4 models and compared the best model fit to the expected for the position of the targets in the sterol pathway. For many cases (72%) the best model fit was the same as predicted from the sterol pathway model but only 54% of the best-fit models were unambiguous. There were some cases were drug-with-itself dose matrices (positive control) did not appear additive as expected. The authors mention that this is due to the "instability in the measured potency of a drug" but I am not sure why a drug-with-itself matrix would not be reproducible.
Finally the authors further tested this relation between drug combinations and target interactions by experimentally measuring drug dose-matrices for 94 drug/compounds in human(HCT116) tumor cells (see text for details).
In summary, even if the prediction accuracy is far from perfect, this work shows that it should be possible to either:
1 - use known pathway models plus drug dose-matrices to improve prediction of the most likely targets of the drugs
2 - use known drug-target relationships plus the drug dose-matrices to predict the network connectivity
One obvious complication is the multiple drug targets for the same compound that would reduce the usefulness of the predictions. Some interesting extensions could be to test drug-drug interactions in KO strains or in combinations with RNAi knock-downs
or protein over-expressions.
As an example , two sequential targets in an unbranched section of the network embedded in an negative feedback produces a dose-matrix that best fits a potentiation model (shown here, adapted from Fig3).
Having established by simulations that there is information on the drug-matrices that relate to the interaction of their targets they then tested the effect of 10 known antifungal drugs on the sterol pathway (also well established) of Candida glabrata. For each drug-drug combination they tried to fit the experimental dose matrices to the same 4 models and compared the best model fit to the expected for the position of the targets in the sterol pathway. For many cases (72%) the best model fit was the same as predicted from the sterol pathway model but only 54% of the best-fit models were unambiguous. There were some cases were drug-with-itself dose matrices (positive control) did not appear additive as expected. The authors mention that this is due to the "instability in the measured potency of a drug" but I am not sure why a drug-with-itself matrix would not be reproducible.
Finally the authors further tested this relation between drug combinations and target interactions by experimentally measuring drug dose-matrices for 94 drug/compounds in human(HCT116) tumor cells (see text for details).
In summary, even if the prediction accuracy is far from perfect, this work shows that it should be possible to either:
1 - use known pathway models plus drug dose-matrices to improve prediction of the most likely targets of the drugs
2 - use known drug-target relationships plus the drug dose-matrices to predict the network connectivity
One obvious complication is the multiple drug targets for the same compound that would reduce the usefulness of the predictions. Some interesting extensions could be to test drug-drug interactions in KO strains or in combinations with RNAi knock-downs
or protein over-expressions.
Thursday, April 24, 2008
SciFoo and BioBarCamp
(Via Attila) The invitations for the 3rd SciFoo have apparently been sent. It will be held from the 8th to the 10th of August at the Googleplex. There is also an idea floating around to organize a BarCamp at the same time as SciFoo. A BarCamp Check out the BioBarCamp wiki and discussion group. There are already several suggestions for venues to organize it and several people interested in attending.
On a side note it's fun to see something like this getting thought of and set up from Twitter/FriendFeed conversations. I have been trying out FriendFeed for a while now and although I am not a big fan of micro blogging (yet?) I really like the conversations around the feed streams.
On a side note it's fun to see something like this getting thought of and set up from Twitter/FriendFeed conversations. I have been trying out FriendFeed for a while now and although I am not a big fan of micro blogging (yet?) I really like the conversations around the feed streams.
Wednesday, April 16, 2008
The shuffle project

Most of my work in the last few years was computational, either looking at the evolution of protein-protein interactions or at the prediction of domain-peptide interactions. The nice thing of working on a lab were a lot of people were doing wet lab experiments was that I had the oportunity to, once in a while, grab some pipettes and participate in some of the work that was going on. One project that worked out well was published today (not open access sorry). My contribution to this project was small but it was a lot of fun and I am very interested in the topic that we worked on. We called it the shuffle project in lab.
The main objective of this work was to study how the addition of gene regulatory interactions impacts on a cell's fitness. We introduced different combinations of existing E.coli promoters and transcription/sigma factors either as plasmids or integrated in the genome. In effect, each construct mimics a duplication of one of the E.coli's sigma factors or transcription factors with a change in its promoter. We then tested the impact on fitness by measuring growth curves under different conditions or performing competition assays.
There were a couple of interesting findings but the two the I found most interesting were:
- The vast majority of the constructs had no measurable impact on growth even by testing different experimental conditions.
- A few constructs could out-compete the control in competition assays (stationary phase survival or passaging experiments in rich medium).
Both of these suggest that the gene regulatory network of E. coli is very tolerant to the addition of novel regulatory interactions. This is important because it tells us that regulatory networks are free to explore new interactions given that there is a limited impact on fitness. From this we could also argue that if there are many equivalent (nearly neutral) ways of regulating gene expression we can't expect to see individual gene regulatory interactions conserved across different species. There are a several recent studies, particularly in eukaryotic species, showing that there is in fact a fast divergence of transcription factor binding sites (see recent review by Brian B. Tuch and colleagues) and many other examples that show that although the selectable phenotype is found to be conserved the underlying interactions or regulations have diverged in different species. (see Tsong et al. and Lars Juhl Jensen et al.)
There are a couple of questions that come from these and other related works. What is the fractions of cellular interactions that are simple biologically irrelevant ? Is it possible to predict to what degree purifying selection restricts changes at different levels of cellular organization ? What is the extent of change in protein-protein interactions ?
Having previously worked on the evolution of protein-protein interactions this is the direction that most interests me. This is why I am currently looking at the evolution of phospho-regulation and signaling in eukaryotic species.
Monday, April 14, 2008
Life Sciences Virtual Conference and Expo
IBM Deep Computing will hold a 2 day virtual conference on Innovations in Drug Discovery and Development (16th and 17th of April 2008). The talks will be recorded and available for playback for those that register. The focus of the talks will be on the impact of High Performance Computing for life science research. The current list of talks:- Dr. Paul Matsudaira, Director Whitehead Institute Professor of Biology and Bioengineering, MIT : Advanced Imaging and Informatics Methods for Complex Life Sciences Problems
- Professor Jan-Eric Litton, Director of Informatics, Karolinska Institute - Biobanking : The Challenge of Infrastructure for Large Scale Population Studies
- Dr. Joel Saltz, Professor and Chair, Department of BioMedical Informatics, Ohio State University : The Cancer Biomedical Informatics Grid (caBIG™)
- Professor Peter J. Hunter, University of Auckland, Bioengineering Institute : Innovation in biological system simulations
- Dr. Ajay Royyuru, IBM Research, Computation Biology at IBM : Update on the IBM Genealogy Project co-sponsored with National Geographic
- Dr. Michael Hehenberger, Solutions Executive, Global Life Sciences : IT Architectures and Solutions for Imaging Biomarkers
Tuesday, April 08, 2008
Structure based prediction of SH2 targets
One of the last few things I worked on during the PhD is now available in PLoS Comp Bio. It is about the structure based prediction of binding of SH2 domains to phospho-peptide targets.
The SH2 domain (src homology domain 2) is a small domain of around 100 amino-acid that has a strong preference to bind peptides that have phosphorylated tyrosines. The selectivity of each domain is typically further restricted by variable surfaces near the phospho-tyrosine binding pocket. See figure below:

The binding preference of each domain can be experimentally determined using for example peptide library screening, phage display or protein arrays. Alternatively we should be able to analyze the increasing amount of structural information and predict the binding specificity of peptide binding domains.
We tried to show here that given a structure of an SH2 domain in complex with a peptide it is possible to predict the binding specificity of this domain. It is also possible, to some extent, predict how mutations on these domains might affect their binding preferences. Finally, combining predictions of specificity with known human phospho sites allows for very reasonable predictions of in vivo SH2-target interactions.
The obvious limitation here is that we need to start with structure of the domain we know from some unpublished work that for families with good structural coverage, homology models can produce specificity predictions that as accurate as from x-ray structure. The other limitation is that giving the lack of dynamics a single conformation of the interactions is modeled and this should in part help determine the binding specificity. One possible to this problem that we have used with some success is to model different peptide conformation for each binding domain.
I should make clear that although I think there is an improvement over previous works there is already a considerable amount of research on this topic that we tried to cite in the introduction and discussion. I would say that some of the best previous work on structure based predictions of domain-peptide interactions has come from Wei Wang lab (see for example McLaughlin et al. or Hou et al.)
This manuscript was the first (and only so far) I collaborated on with Google Docs. It worked well and I recommend it to anyone that needs to co-write a manuscript with other people. It saves a lot of emails and annotations on top of annotations.
The SH2 domain (src homology domain 2) is a small domain of around 100 amino-acid that has a strong preference to bind peptides that have phosphorylated tyrosines. The selectivity of each domain is typically further restricted by variable surfaces near the phospho-tyrosine binding pocket. See figure below:
The binding preference of each domain can be experimentally determined using for example peptide library screening, phage display or protein arrays. Alternatively we should be able to analyze the increasing amount of structural information and predict the binding specificity of peptide binding domains.
We tried to show here that given a structure of an SH2 domain in complex with a peptide it is possible to predict the binding specificity of this domain. It is also possible, to some extent, predict how mutations on these domains might affect their binding preferences. Finally, combining predictions of specificity with known human phospho sites allows for very reasonable predictions of in vivo SH2-target interactions.
The obvious limitation here is that we need to start with structure of the domain we know from some unpublished work that for families with good structural coverage, homology models can produce specificity predictions that as accurate as from x-ray structure. The other limitation is that giving the lack of dynamics a single conformation of the interactions is modeled and this should in part help determine the binding specificity. One possible to this problem that we have used with some success is to model different peptide conformation for each binding domain.
I should make clear that although I think there is an improvement over previous works there is already a considerable amount of research on this topic that we tried to cite in the introduction and discussion. I would say that some of the best previous work on structure based predictions of domain-peptide interactions has come from Wei Wang lab (see for example McLaughlin et al. or Hou et al.)
This manuscript was the first (and only so far) I collaborated on with Google Docs. It worked well and I recommend it to anyone that needs to co-write a manuscript with other people. It saves a lot of emails and annotations on top of annotations.
Bio::Blogs#20 - the very late edition
I said I would organize the 20th edition of Bio::Blogs here on the 1st of April but April fools and my current work load did not allow me to get Bio::Blogs up on time.
There were a couple of interesting discussions and blog posts in March worth noting. For example, Neil mentioned a post by Jennifer Rohn started that initiated what could be one of the longest threads in Nature Network :"In which I utterly fail to conceptualize". It started off as small anti-Excel rant but turned in the comments to 1st) a discussion of bioinformatic tools to use, 2nd) a discussion of wet versus dry mindset and how much one should devote to learn the other. Finally it ended up as a exchange about collaborations and how a social networking site like Nature Network could/should help scientists find collaborators. There was even a group started by Bob O'Hara to discuss this last issue further.
I commented on the thread already but can try to expand a bit on it here. Nature Network is positioned as a social networking site for scientists. So far the best that it has to offer has been the blog posts and forum discussions. This is not very different from a "typical" forum. It facilitates the exchange of ideas around scientific topics but NN could try to look at all the typical needs of scientists (lab books, grant managing, lab managing, collaborations, protocols, paper recomendations,etc) and decide on a couple that they could work into the social network site. Ways to search for collaborators and maybe paper recommendation engines that take advantage of your network (network+connotea) are the most obvious and easier to implement. Thinking long term, tools to help manage the lab could be an interesting addition.
Another interesting discussion started from a post by Cameron Neylon on a data model for electronic lab notebooks (part I, II, III). Read also Neil's post, and Gibson's reply to Cameron on FuGE.
How much of the day to day activities and results need to be structured ? How heavy should this structure be to capture enough useful computer readable information ? Although I find these questions and discussion interesting, I would guess that we are far from having this applied to any great extent. If most people are reluctant to try out new applications they will be even less willing to convey their day to day practices via a structured data model. I mentioned recently the experiment under way at FEBS letters journal to create structured abstracts during the publishing process. As part of the announcement the editors commissioned reviews on the topic. It is worth reading the review by Florian Leitner and Alfonso Valencia on computational annotation methods. They argue for the creation of semi-automated tools that take advantage of the automatic methods and the curators (authors or others). The problems and solutions for annotation of scientific papers are shared with digital lab notebooks. It hope that more interest in this problem will lead to easy to use tools that suggest annotations for users under some controlled vocabularies.
Several people blogged about the 15 year old bug found in the BLOSUM matrices and the uncertainty in multiple sequence alignments. See posts by Neil, Kay Lars and Mailund.
Both cases remind us of the importance of using tools critically. The flip side of this is that it is impossible to constantly question every single tool we use since this would slow our work down to a crawl.
In the topic of Open Science, in March the Open Science proposal drafted by Shirley Wu and Cameron Neylon, for the Pacific Symposium on Biocomputing was accepted. It was accepted as a 3 hour workshop consisting of invited talks, demos and discussions. The call for participation is here along with the important deadlines for submissions (talk proposals due June 1st and poster abstracts due the 12th of September).
On a related note Michael Barton has set up a research stream (explained here) He is collecting updates on his work, tagged papers and graphs posted to Flickr into one feed that gives an immediate impression of what he is working on at present time. This is really a great set up. Even for private use withing a lab or across labs for collaboration this would give everyone involved the capacity to tap into the interesting feeds. I would probably not like to have everyone's feeds and maybe a supervisor should have access to some filtered set of feeds or tags to get only the important updates but this looks a step in the right direction. The same way, machines could also have research feeds that I could subscribe too to get updates on some data source.
Also in March, Deepak suggested we need more LEAP (Lightly Engineered Application Products)in science. He suggests that it is better to have one tool that does a job very well than one that does many somewhat well. I guess we have a few examples of this in science. Some of the most cited papers of all time are very well known cases of a tool that does one job well (ex: BLAST).
Finally, some meta-news on Bio::Blogs. I am currently way behind on many work commitments and I don't think I can keep up the (light) editorial work required for Bio::Blogs so I am considering stopping Bio::Blogs altogether. It has been almost two years and it has been fun and hopefully useful. The initial goal of trying to nit together the bioinformatic related blogs and offering some form of highlighting service is still required but I am not sure this is the best way going forward.
Still, if anyone wants to take over from here let me know by email (bioblogs at gmail.com).
There were a couple of interesting discussions and blog posts in March worth noting. For example, Neil mentioned a post by Jennifer Rohn started that initiated what could be one of the longest threads in Nature Network :"In which I utterly fail to conceptualize". It started off as small anti-Excel rant but turned in the comments to 1st) a discussion of bioinformatic tools to use, 2nd) a discussion of wet versus dry mindset and how much one should devote to learn the other. Finally it ended up as a exchange about collaborations and how a social networking site like Nature Network could/should help scientists find collaborators. There was even a group started by Bob O'Hara to discuss this last issue further.
I commented on the thread already but can try to expand a bit on it here. Nature Network is positioned as a social networking site for scientists. So far the best that it has to offer has been the blog posts and forum discussions. This is not very different from a "typical" forum. It facilitates the exchange of ideas around scientific topics but NN could try to look at all the typical needs of scientists (lab books, grant managing, lab managing, collaborations, protocols, paper recomendations,etc) and decide on a couple that they could work into the social network site. Ways to search for collaborators and maybe paper recommendation engines that take advantage of your network (network+connotea) are the most obvious and easier to implement. Thinking long term, tools to help manage the lab could be an interesting addition.
Another interesting discussion started from a post by Cameron Neylon on a data model for electronic lab notebooks (part I, II, III). Read also Neil's post, and Gibson's reply to Cameron on FuGE.
How much of the day to day activities and results need to be structured ? How heavy should this structure be to capture enough useful computer readable information ? Although I find these questions and discussion interesting, I would guess that we are far from having this applied to any great extent. If most people are reluctant to try out new applications they will be even less willing to convey their day to day practices via a structured data model. I mentioned recently the experiment under way at FEBS letters journal to create structured abstracts during the publishing process. As part of the announcement the editors commissioned reviews on the topic. It is worth reading the review by Florian Leitner and Alfonso Valencia on computational annotation methods. They argue for the creation of semi-automated tools that take advantage of the automatic methods and the curators (authors or others). The problems and solutions for annotation of scientific papers are shared with digital lab notebooks. It hope that more interest in this problem will lead to easy to use tools that suggest annotations for users under some controlled vocabularies.
Several people blogged about the 15 year old bug found in the BLOSUM matrices and the uncertainty in multiple sequence alignments. See posts by Neil, Kay Lars and Mailund.
Both cases remind us of the importance of using tools critically. The flip side of this is that it is impossible to constantly question every single tool we use since this would slow our work down to a crawl.
In the topic of Open Science, in March the Open Science proposal drafted by Shirley Wu and Cameron Neylon, for the Pacific Symposium on Biocomputing was accepted. It was accepted as a 3 hour workshop consisting of invited talks, demos and discussions. The call for participation is here along with the important deadlines for submissions (talk proposals due June 1st and poster abstracts due the 12th of September).
On a related note Michael Barton has set up a research stream (explained here) He is collecting updates on his work, tagged papers and graphs posted to Flickr into one feed that gives an immediate impression of what he is working on at present time. This is really a great set up. Even for private use withing a lab or across labs for collaboration this would give everyone involved the capacity to tap into the interesting feeds. I would probably not like to have everyone's feeds and maybe a supervisor should have access to some filtered set of feeds or tags to get only the important updates but this looks a step in the right direction. The same way, machines could also have research feeds that I could subscribe too to get updates on some data source.
Also in March, Deepak suggested we need more LEAP (Lightly Engineered Application Products)in science. He suggests that it is better to have one tool that does a job very well than one that does many somewhat well. I guess we have a few examples of this in science. Some of the most cited papers of all time are very well known cases of a tool that does one job well (ex: BLAST).
Finally, some meta-news on Bio::Blogs. I am currently way behind on many work commitments and I don't think I can keep up the (light) editorial work required for Bio::Blogs so I am considering stopping Bio::Blogs altogether. It has been almost two years and it has been fun and hopefully useful. The initial goal of trying to nit together the bioinformatic related blogs and offering some form of highlighting service is still required but I am not sure this is the best way going forward.
Still, if anyone wants to take over from here let me know by email (bioblogs at gmail.com).
Tuesday, April 01, 2008
(April fools update) Leveling the playing field – NIH to ban brain enhancing practices
Update - This post was part of an April 1st news but I am sure everyone got it :). Still the pressure in science is real and worth thinking about.
There has been quite a buildup of discussion surrounding the idea of brain enhancing drugs in the last couple of days. It started early march with a New York Time piece “Brain Enhancement Is Wrong, Right?” and it has culminated with the recent announcement of the World Anti Brain Doping Authority (WABDA) a joint effort from the NIH and EU to initiates studies on the reach of brain enhancing practices in science today.
There are many points of view already expressed on the web, see for example: ·Chris Patil
·Bora
·Anna Kushnir
·Genome Technology
·Egghead
·Eye on DNA
·Bob Ohara
·Martin Fenner
·Jennomics
My first reaction was of pure skepticism, this must be some kind of joke I thought, so I tried to probe a little bit around the UCSF campus to see if anyone has ever heard of this as well. One of my supervisors mentioned that about a year ago he had to fill out a NIH survey addressing the current problem of very high rejection rates for NIH grants. It looks like within this survey there was a section regarding the problems of competition in science and some of these brushed around the topic of brain enhancing practices. It could be that at the time NIH was trying to measure how far would people go under an extreme competitive environment.
This really got me thinking about how we are engaged in an environment that is not that far removed from highly competitive sports. How many stories have we heard about data forgery and scandalous retractions in the last couple of years? To what extent will people go to secure their place in science? To be recognized?
So maybe NIH is right in being proactive. Even if the issue is not as serious in science as it is in sports, unless there is an amazing influx of money or a considerable decrease of working scientists this might become an important problem. If nothing else we will get to know the current extent of these practices and it highlights yet again how far we deviated from course. The money society puts into scientific research is being wasted on overlapping competitive projects. Research agendas should be open and free for anyone to participate in. Maybe NIH should regulate that as well.
There has been quite a buildup of discussion surrounding the idea of brain enhancing drugs in the last couple of days. It started early march with a New York Time piece “Brain Enhancement Is Wrong, Right?” and it has culminated with the recent announcement of the World Anti Brain Doping Authority (WABDA) a joint effort from the NIH and EU to initiates studies on the reach of brain enhancing practices in science today.
There are many points of view already expressed on the web, see for example: ·Chris Patil
·Bora
·Anna Kushnir
·Genome Technology
·Egghead
·Eye on DNA
·Bob Ohara
·Martin Fenner
·Jennomics
My first reaction was of pure skepticism, this must be some kind of joke I thought, so I tried to probe a little bit around the UCSF campus to see if anyone has ever heard of this as well. One of my supervisors mentioned that about a year ago he had to fill out a NIH survey addressing the current problem of very high rejection rates for NIH grants. It looks like within this survey there was a section regarding the problems of competition in science and some of these brushed around the topic of brain enhancing practices. It could be that at the time NIH was trying to measure how far would people go under an extreme competitive environment.
This really got me thinking about how we are engaged in an environment that is not that far removed from highly competitive sports. How many stories have we heard about data forgery and scandalous retractions in the last couple of years? To what extent will people go to secure their place in science? To be recognized?
So maybe NIH is right in being proactive. Even if the issue is not as serious in science as it is in sports, unless there is an amazing influx of money or a considerable decrease of working scientists this might become an important problem. If nothing else we will get to know the current extent of these practices and it highlights yet again how far we deviated from course. The money society puts into scientific research is being wasted on overlapping competitive projects. Research agendas should be open and free for anyone to participate in. Maybe NIH should regulate that as well.
Monday, March 31, 2008
call for Bio::Blogs #20
The 20th edition of Bio::Blogs will be posted here by the end of tomorrow. This is very short notice but if anyone would like to contribute please send a few links of the most interesting things of the past month and I will put everything together (email bioblogs at gmail).
Friday, March 21, 2008
The structured abstract experiment at FEBS letters
The journal "FEBS letters" is starting a publishing experiment on structured abstracts. As described in the editorial the experiment is aimed at:
"integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript."
The experiment will be a collaboration between FEBS letters and the interaction database MINT, it has started in the beginning of this year and it will last 6 months. It will try to evaluate the necessary tools and the authors's "degree of interest (and competence) to invest" in this annotation process.
It will be very interesting to see the results of this experiment to see if authors are willing to do this extra bit of work and how much this might facilitate the annotation efforts.
"integrating each manuscript with a structured summary precisely reporting, with database identifiers and predefined controlled vocabularies, the protein interactions reported in the manuscript."
The experiment will be a collaboration between FEBS letters and the interaction database MINT, it has started in the beginning of this year and it will last 6 months. It will try to evaluate the necessary tools and the authors's "degree of interest (and competence) to invest" in this annotation process.
It will be very interesting to see the results of this experiment to see if authors are willing to do this extra bit of work and how much this might facilitate the annotation efforts.
Saturday, March 08, 2008
Bio::Blogs #19 - Bioengineering
This months edition of Bio::Blogs is now available at Duncan's blog and it is mostly focused on (bio)engineering. Click the link for a summary of interesting things that were blogged about in the past month.
I will be hosting issue number 20 here in the blog, without a clear topic. Possibly with some emphasis on data integration. Email your top picks of the month until the end of March to bioblogs at gmail .com
I will be hosting issue number 20 here in the blog, without a clear topic. Possibly with some emphasis on data integration. Email your top picks of the month until the end of March to bioblogs at gmail .com
Sunday, March 02, 2008
Design, mutate and freze
Drew Endy talked about engineering biology for Edge. Most of the emphasis is still on standardization of biological parts and the importance of simplifying the process of creating a biological function. Still it would be nice to hear from him some new ideas about establishing processes of engineering biology. His whole speech seems focused on creating the hacker culture in biology. To transpose all the same concepts that would allow us to re-create the explosive growth of tinkering and production that we saw for electronics and computer programing within the biological sciences.
I agree with most of what he says, that we should: 1)focus on method development; 2)work on a registry of parts and 3) foster an "open source"/hacker culture in synthetic biology. In this text he did not mention for example the importance of modeling but it is implicit in the standardization of parts. Once you have a computer simulation of the process you wish to engineer that you should be able to reach into the parts list to implement it. The problem with this concept of standardized parts is the complexity that Drew Endy dislikes so much. There is still no way around it. We can take a part that has been very well defined in E. coli, plug into a yeast plasmid and it might not work at all.
If we are still far way from the ideal plug and play maybe we could try to take advantage of what biology can do very well, to evolve to a suitable solution. I would argue that we should develop engineering protocols that could take advantage of the evolutionary process.
<insert rambling>
Lets say we want to implement a function and I know beforehand that I will not be able to get perfect parts to implement it. Can we design this function in a way that it will have a large funnel of attraction for the design properties that I am interested in ? Are there biological parts that are more amenable to a directed evolutionary experiment to reach that design goal ? How can I increase the mutation rate for a controlled period of time and only for the stretch of DNA that I want to evolve ? Maybe it is possible to place the parts in a plasmid and have the replication of this plasmid be under a different polymerase that is more error prone ?
</insert rambling>
If we could answer some of these questions (maybe we have already), we could design the function of interest (modeling), pull parts that would be close to the solution, mutate/select until the best design is achieved and then freeze it by reducing the generation of diversity in some way.
Further reading:
Synthetic biology: promises and challenges
Molecular Systems Biology 3 Article number: 158 doi:10.1038/msb4100202
I agree with most of what he says, that we should: 1)focus on method development; 2)work on a registry of parts and 3) foster an "open source"/hacker culture in synthetic biology. In this text he did not mention for example the importance of modeling but it is implicit in the standardization of parts. Once you have a computer simulation of the process you wish to engineer that you should be able to reach into the parts list to implement it. The problem with this concept of standardized parts is the complexity that Drew Endy dislikes so much. There is still no way around it. We can take a part that has been very well defined in E. coli, plug into a yeast plasmid and it might not work at all.
If we are still far way from the ideal plug and play maybe we could try to take advantage of what biology can do very well, to evolve to a suitable solution. I would argue that we should develop engineering protocols that could take advantage of the evolutionary process.
<insert rambling>
Lets say we want to implement a function and I know beforehand that I will not be able to get perfect parts to implement it. Can we design this function in a way that it will have a large funnel of attraction for the design properties that I am interested in ? Are there biological parts that are more amenable to a directed evolutionary experiment to reach that design goal ? How can I increase the mutation rate for a controlled period of time and only for the stretch of DNA that I want to evolve ? Maybe it is possible to place the parts in a plasmid and have the replication of this plasmid be under a different polymerase that is more error prone ?
</insert rambling>
If we could answer some of these questions (maybe we have already), we could design the function of interest (modeling), pull parts that would be close to the solution, mutate/select until the best design is achieved and then freeze it by reducing the generation of diversity in some way.
Further reading:
Synthetic biology: promises and challenges
Molecular Systems Biology 3 Article number: 158 doi:10.1038/msb4100202
Tuesday, February 26, 2008
Jonathan Eisen@PLoS
PLoS has a new Academic Editor in Chief that blogs, works on evolution and has been at SciFoo twice. Jonathan A. Eisen, explains his reasons for accepting the job in an editorial available online. Among other things, he states:
I wonder if we will ever see the AEIC of Science/Nature/Cell blogging :). The editorials are the closest article format to a blog post but they insist on a somewhat exaggerated formality. Just as an example here is a link to the 2007 archives of the (great) editorials of Frank Gannon from EMBO reports.
Second, I want to work with the professional staff at PLoS Biology, the Academic Editors, and anyone else in the community who shares my desire to build new initiatives that will keep PLoS Biology as a top-tier journal. These would include ideas like producing issues dedicated to particular themes, actively recruiting excellent papers in fields where OA is not yet common, producing more outreach and educational material, and engaging bloggers and fully embracing the Web 2.0 world.I actually would like to get a bit more involved with what they are doing at PLoS, in particular with what they might be discussing for PLoS ONE and the hubs. Maybe I can pester them later on during the year. For some reactions on the news and more information, here is the related Postgenomic cluster.
I wonder if we will ever see the AEIC of Science/Nature/Cell blogging :). The editorials are the closest article format to a blog post but they insist on a somewhat exaggerated formality. Just as an example here is a link to the 2007 archives of the (great) editorials of Frank Gannon from EMBO reports.
Friday, February 22, 2008
Call for Bio::Blogs#19
Duncan Hull has volunteer to host the next issue of Bio::Blogs (a bioinformatic related monthly blog journal). It will be out in the beginning of March on the O'Really? blog. The suggested theme for this month is the relationship between Biology and Engineering inspired on the interview published on Edge.org "Engineering and Biology": A Talk with Drew Endy. Anyone can send links for this issue on this topic but also for other interesting bioinformatic posts to bioblogs at gmail.com
We could also try to format if automatically using FeedJournal as suggested by Neil.
We could also try to format if automatically using FeedJournal as suggested by Neil.
Friday, February 08, 2008
Late Links: Bio::Blogs#18 + new blog
I have been away from the web for the last few weeks as I moved to San Francisco to start my first postdoc. I will be working at UCSF in the Lim Lab and the Krogan lab on the evolution of signaling in yeasts. I'll try to blog more about it later during the year. I am looking forward to getting to know the bay area and hopefully make the most of the great (and apparently relaxed) science & technology environment.
Early this month Michael Barton edited another great edition of Bio::Blogs mostly dedicated to open science. He also put together an essay on the subject that is worth reading and commenting on. The next edition of Bio::Blogs will probably come back here to Public Rambling on the 1st of March (unless there is another volunteer).
Also in these last few weeks Lars Juhl Jensen started blogging at Buried Treasure. I met Lars at EMBL while I was doing my PhD and he always had time to help me out when I had some work related question. Like Roland Krause said Lars is one of the most prolific researchers in computational biology I ever met.
Early this month Michael Barton edited another great edition of Bio::Blogs mostly dedicated to open science. He also put together an essay on the subject that is worth reading and commenting on. The next edition of Bio::Blogs will probably come back here to Public Rambling on the 1st of March (unless there is another volunteer).
Also in these last few weeks Lars Juhl Jensen started blogging at Buried Treasure. I met Lars at EMBL while I was doing my PhD and he always had time to help me out when I had some work related question. Like Roland Krause said Lars is one of the most prolific researchers in computational biology I ever met.
Saturday, January 26, 2008
Submissions for Bio::Blogs#18
I am slowly re-connecting to the online world again, trying to pick trough the thousands of blog posts and other RSS feed alerts piled up in GReader. Way before I manage to do that (unless I press the read all button) the next edition of Bio::Blogs will be up at Bioinformatics Zen. Michael Barton has kindly agree to host the 18th edition of Bio::Blogs with a particular emphasis on Open Science and Open Notebook Science. It is scheduled for February 1st and anyone can participate by sending a link of their submissions to bioblogs at gmail.com.
To get in the spirit of the upcoming edition and to inspire some related blog posts go check out his recent movie. What do you think ? Will there be a significant increase of people sharing and collaborating online this year ?
To get in the spirit of the upcoming edition and to inspire some related blog posts go check out his recent movie. What do you think ? Will there be a significant increase of people sharing and collaborating online this year ?
Sunday, December 23, 2007
Disconnecting for a while
I am disconnecting from blogging for longer than usual. There will not be a Bio::Blogs edition on the 1st of January but there will be one dedicated to Open Science on the 1st of February. Before I go, congratulation to the chemioinformatics related blogging group that got a paper from combined efforts. Also, have a look at the new blog from Jason Kelly called Free Genes that will focus on synthetic biology and open science issues.
I'll be back sometime in the end of January. Happy celebrations to everyone and a good start to the new year.
Subscribe to:
Comments (Atom)