Friday, September 05, 2014

Collaborative postdoc fellowship opportunities


I interrupt this long blogging hiatus to point out two potential postdoc fellowship opportunities to work with our group at the EMBL-EBI. One is the EIPOD program that is an EMBL wide interdisciplinary program. For this fellowship the project is collaboration with Nassos Typas (genetics) and Jeroen Krijgsveld's (proteomics) groups at the EMBL in Heidelberg. Successful candidates would be studying how Salmonella uses post-translational modification effector proteins to regulate and subvert the host cell. It is important to note that EIPOD applicants must be interested in doing both the computational and experimental aspects of the project. Applicants are only expected to have experience in one of the areas (bioinformatics, proteomics, genetics) and an interest in learning about the others. The deadline for the EIPOD application in 11 of September.

The other fellowship opportunity is the newly created EBPOD program. This is a collaborative program set up between the EMBL‐EBI and the NIHR Cambridge Biomedical Research Centre (BRC). As described in the program webpage this is a program meant to explore and develop computational approaches for translational clinical research involving human subjects. Our project proposal (PDF link) aims to study and identify cell surface markers of primed/activated neutrophils obtained from patients with chronic inflammatory diseases. The project is a collaboration with Paul J Lehner and Edwin Chilvers. The applicant would be in charge of the computational analysis which would focus on proteomics data of protein composition changes in the membrane vs total cell (as in Weekes et al. Cell 2014). Prior expertise in protein related computational research would be ideal.



Wednesday, December 18, 2013

State of the lab, year 1 – setting up

I have used this blog in the past to keep track of my academic life where I can give a less formal perspective on papers I have published or ideas I am working on. Starting a group has made me think a bit about what I blog about. I have more responsibilities towards the people that have decided to work with me, towards the institution that has hired me (EMBL-EBI) and funding sources that support our work. At least for now I have decided to keep on sharing my personal view and in that context I though it could be interesting to write down my path as group leader in academia. This might become a yearly “thing”.

I started at the EMBL-EBI January 7 and in a blink of an eye one year has gone by. I have just arrived in Portugal for a conference and holidays and I said goodbye to four people that very courageously decided to work with a unknown newbie group leader. I could sum-up what happened in this first year by saying that the group-leader tittle now makes sense – I am coordinating an actual group. Most of this year was spent applying for funding, recruiting and trying to know more about the different groups working on campus.

From an empty room to a research group

EMBL-EBI is really a great place to start a group. For those that don't know the EMBL system, group leaders are given very generous core funding to work for 5 years, plus an additional 4 years after a review process. The chances of failing the review are small but there is essentially no tenure. Core funding and additional “internal” postdoc fellowships are sufficient to run a small group without external grants. We are encouraged to apply for funding but money is not the most immediate source of stress. So for me, since I started recruiting only after arriving in January, facing that empty room where a group should be working was the first thing on my mind. Recruiting postdocs for a unknown and empty group is particularly challenging. I tried to do some of the obvious things like emailing related groups that could have people about to finish the PhD and promoting the vacancies at conferences. It is hard to quantify but I do have the impression that my online presence has been an advantage in this. Once the first couple of people started and group meetings made sense the empty room stress went away. I know people starting experimental labs right now and I have to say that computational people have it way to easy. We can buy a few computers and the “lab” is set up. 
I spent a considerable amount of time applying for funding which is always somewhat frustrating. I don't mind writing grants but I am happier doing actual research. Around 6 months into the job I managed to re-start doing research and I have managed to keep working on fairly constant basis. I hope I will keep having/making time for research for as long as possible.

Meet the gang

This year we got an HFSP CDA and an ESPOD fellowship which together with the core funding allowed me to grow the group fairly quickly. The first to join was David Ochoa (postdoc, @d0choa, webpage) who will be working initially on PTM dynamics under different conditions. He also introduced me to the amazing BlackMirror series, the best fiction I have seen in a long time. Vicky Kostiou (intern) joined after and is doing a great job of improving the PTMfunc website which should be updated late January (stay tuned). The most recent arrivals were Romain Studer (postdoc, @RomainStuder, blog) and Brandon Invergo (postdoc, webpage). Romain will be using his phylogenetic and structural experience to study PTM evolution and Brandon was awarded the ESPOD fellowship to work with Jyoti Choudhary and malaria groups at Sanger on Plasmodium PTMs. Omar Wagih (@omar wagih) will be the fist PhD student joining in January. Finally, although we have still not signed a contract Marco Galardini (@mgalactus, webpage) will likely join in February to work on a collaborative project with Nassos Typas' group at the EMBL-Heidelberg. 

To be, or not to be, an experimental group

One of my concerns when I joined the EMBL-EBI was that, although the Sanger is just next door, EBI is a purely computational institute. Doing computational work is pretty amazing but progress can often be limited by lack of data. High-throughput research is removing somewhat this limitation since there are probably more observations made than we can all analyze. Still, if you are really interested in going in a specific direction then a experimental group simply has more power to make the right observations. My solution for this problem, for now, will be to co-supervise people with experimental groups including Brandon's EIPOD project, Marco's project with Nassos Typas and a future hire with Silvia Santos' lab in London. This is an experiment in itself and I guess in 2 to 3 years I be able to evaluate how practical this is. One alternative is to make use of research services such as the ones listed in Science Exchange. I have discussed with a couple of companies what would be the prices for some of the work I am interested in doing. These are fairly expensive but might be a good complement to the collaborations.

Summary

So overall, the group is off to a good start. It is funded for a few years at a reasonable level and we have collaborations with other groups that share some common interests. There were some things I wish could have gone better. I didn't get all the funding I applied to, which is expected. I also didn't manage to submit the two last manuscripts that still contain work from my postdoc. It would have been great to start the second year with that off my back. Still, I am happy with how things look for the next few years. It is a privilege to be able to coordinate this group of people and level of resources around topics that I find so interesting.  



Friday, November 01, 2013

Introducing BMC MicroPub – fast, granular and revolutionary

(Caution: Satire ahead)

I am happy to be able to share some exciting science publishing news with you. As you know, in the past few years, there has been a tremendous progress in open access publishing. The author-paying model has been shown to be viable in large part thanks to the pioneering efforts of BMC and PLOS. In particular PLOS One has been an incredible scientific and business success story that many others are trying to copy. Although these efforts are a great step forward they don't do enough to set all of the scientific knowledge free in a timely fashion. Sure you can publish almost anything today such as metadata, datasets, negative results and the occasional scientific advancement but the publishing process still takes too much time. In addition we are forced to build a story around the bits and pieces in some laborious effort to communicate our findings in a coherent fashion. Many of us feel that this publishing format is outdated and does not fit our modern quick-paced  internet age. What I am sharing with you today is going to change that.

Introducing BMC MicroPub
In coordination with BMC we are going to launch soon the pilot phase of a new online-only publishing platform. It was though from the ground up to allow for the immediate publishing of granular scientific information. Peer-review happens after online publication of the content and evaluation is not going to be based on trivial and outdated notions of scientific impact. Best of all, it is tightly integrated with the social tools we already use today. In fact, authors are asked to register to the system with their twitter account and to link it to an ORCID author ID. From then on, their twitter feed is parsed and any tweet containing the #micropub tag will be considered a submission. Authors are themselves reviewers and any submission that gets re-tweeted by at least 3 other MicroPub registered scientists is considered to be “peer-reviewed” and a DOI is issued for that content. An author can create a longer communication by replying to a previous #micropub tweet and in this way create a chain that the journal can track and group in MicroPub stacks (TM). What the team involved here has done is nothing short of amazing. We are taking the same platform we use to share cute pictures of cats and revolutionizing scientific publishing. To start using the journal authors pay a one time registration fee followed by a modest publication charge for each published content. However, the journals is waving any charges for the first 100 authors and the first 100 publications. We hit a snag in discussions with Pubmed but with your support we will be tracked by them starting next year.

Pioneering granularity
The project started a few months ago after a first attempt I covered in a previous blog post. Right now we have also an exciting experiment in granular sharing of genome content underway. You can follow the tweets of @GenomeTweet to get an idea of the future of this brave new world. The current front page of the journal gives you an indication of some of the cool science being published by early adopters. The site is currently only available to beta-testers so here is screen-shot of the current version:

I sat down with the open access advocate Dr Mark Izen from UC Merced to discuss the new journal.

Dear Mark, given your enthusiasm for open access what do you think of this initiative?
I think that experimentation in scientific publishing is fantastic. Any attempt to promote open access and get rid of the current established closed access, impact factor driven system is a great thing. One concern I have is that, although the content is published under a CC0 license, the publishing process is currently reliant on Twitter which is a closed proprietary technology. We should really ask ourselves if this goes all the way in terms of open science.

Some would say that they don't have time to read science with this level of granularity so devoid of any significant advances. In the words of an anonymous tenured prof: “You must be joking right!?”. What would you say to these naysayers?
To be blunt, I think they lack vision. Ultimately we owe it to all tax payers to be as transparent as possible about our publicly funded science. Now we can do that, 140 characters at a time. Moreover, the possibility to drive science forward by making information available as quickly as possible is an amazing possibility. Scientists are already using twitter to share information, we are just going one step forward here and going to start sharing science as it happens. You could get corrections on your protocol before your experiments have finished running ! If you blink your eyes you may literally lose the next big discovery.

So you are not concerned that this increasing level of granularity, small bite research, is going to drown everyone in noise and actually slow down the spread of information in science ? 
Absolutely not. It is a filter failure and we are sure that someone is bound to come up with a solution. In the future all sorts of different signals will allow us to filter through all this content and weed out the most interesting parts for you. You will be able to get to your computer and get in your email or in some website just the information that an algorithm thinks you should read. I am sure it is doable, it is question of setting it up.


Disclaimer: If you have not noticed by now, this is a fictional post meant to entertain. 

Tuesday, October 29, 2013

Sysbio postdoc fellowship: spatio-temporal control of cell-cycle regulation


Funding is available for a 3 year postdoctoral fellowship to study spatio-temporal control in cell-cycle regulation. This is a join project between our group at the EMBL-EBI and the Quantitative Cell Biology group headed by Silvia Santos at the MRC Clinical Sciences Centre in London. More information about the groups interests can be found in the respective webpages.

The main objective of this project will be to study how the spatial and temporal control of key cell-cycle proteins change in different biological contexts. Examples of these different contexts include different differentiation states and/or different species.


We are looking for candidates that are interested in doing both experimental and computational work and previous experience in  cell biology, microscopy, programming, image analysis and/or modelling of dynamical systems are all considered an asset. We will consider candidates that have a stronger expertise on either experimental or computational methods but are interested in learning and using both approaches. Additional information and application link is here with a closing date of 24 November 2013. We are available for further clarification in regards to suitability of background or information about the projects.


Tuesday, October 22, 2013

Pubmed Commons - the new science water-cooler

Pubmed has decided to dip its toes into social activities by adding a commenting feature to it's website (named Pubmed Commons). It will start off in a closed pilot phase where you have to receive an invite in order to be able to comment but it should eventually be widely available. The implementation is simple and everything works as you would expect. Here is a screenshot with an example comment:

As you would expect you get an option to add a comment, to edit or delete previous comments you have made and up-vote other comments. In future versions you will be able to reply to comments in a threaded discussion. The comments, at least for now, cannot be anonymous and in the pilot phase you have to be invited to join. It is also restricted to authors that have at least one abstract on Pubmed already. There are arguments in favour and against anonymity but I lean in favour of identifiable comments to keep the trolls at bay. In this way the comments are also associated to you (via your NCBI profile) and can be listed. Unfortunately NCBI accounts are still not possible to link to an ORCID ID but that should be easily fixed. You will be able to search articles that have comments are these will be made available through their APIs. 

I am sure there will be several criticism such as the fact that is invite only or that you are adding comments to articles that you might not even have access too. Overall, I think this is a great development.  Commenting systems have, for the most part, failed to work on the publishers side and the hope is that this might finally create a discussion forum with higher participation. The advantages here are a higher visibility and lower friction when compared with most publishers existing commenting systems. For ALMs it might be also very positive assuming this does increase the level of participation. I for one would like to have useful opinion attached to articles while I search for them online. 


You can get the whole back-story from this post by Rob Tibshirani and from many of other blog posts and press releases that I am sure will be hitting the web today. 



Monday, October 21, 2013

Project management (online) tools

I am currently looking for a tool to centralize project management across the group. I asked on twitter for suggestions and received a number useful tips. In case this is of use to others here are a few notes I took when exploring of few of these options. The features I am particularly interested are: low/no set-up or upkeep requirements, intuitive use, rich project notebooks with the possibility to add images and back-up support. Nice features to have: possibility to share with public; integration with Dropbox and/or Google Drive.

Here are the notes in no particular order with my preferences at the end.

Basecamp
Simple, intuitive and well designed project management and collaboration tool. Each project can have: project updates (activity list), text documents (simple text documents, cannot add images), To-do lists (linked to the calendar); discussion items (text and embedded images that can stand-alone or be linked to any other item including other discussions). Group view can quickly show you updates across all projects you are involved in. The group and projects view is great but it would be nicer to have notebooks within each project as implemented in Evernote. Discussions can be used as notebooks but they get mixed in with comments on any item such as a to-do list item. All projects can be downloaded for back-up but automation required 3rd party service or coding via the API. iOS app available and Android via 3rd party app. No free account (60 day trail), plans start at $20/month 10 projects 3GB limit up to £3000/year unlimited projects 500GB limit. Basecamp can be extended from a list of additional services (mostly 3rd party) and they tend to cost additional fees.

Freedcamp
Project views with to-do lists, discussions, milestones, file attachments. Dashboard view with group activity. Marketplace with additional group and project widgets to add (eg. Group chat and wiki's). Free account 20MB limit with paid accounts starting at $2.5/month for 1GB up to $40/month for unlimited storage. Fairly cheap but  below average design and somewhat sluggish.

Evernote
This tool is centred on the idea of notebooks (collections of notes). Notes can contain text, embedded images, to-do lists, voice clips. Has a stand-alone program that facilitates copy-paste actions into the notebooks (mac and windows but works well under wine). Notebooks from free accounts cannot be edited by others. Premium accounts (£35 per year) can have notebooks edited by others. One premium account could be used to centralise group notebooks. Business accounts (£8.00/user/month) are needed to have group management features. Limited tools for group interactions (no comments, chat, activity dashboard) when compared with others.

Redmine
Free but but requires local installation. Fully fledged project management tool: activity, roadmap, issue tracker, gannt charts,calendar, news, documents, wiki, forum, files. Recommended by several people in twitter. I only had a quick look since I would prefer an online tool without set-up.

Trello
Card concept – Each card can have Activities (could be text description of project entries), to-do lists, files, can be assigned to specific people, due-dates,Attachments including google drive and dropbox. Cards can be stacked in groups, moved around, tagged with color codes, stickers and individuals responsible for them. It looks nice but I don't like the design for project management. Android and iOS apps. 10mb standard, 250mb gold (plus additional customization features) $5/month or $45 per year

Teambox
Dashboard concept; Users can be assigned to projects. Dashboard view has the list of tasks and notifications for the day. Projects can have activities, conversations tasks, notes, files and members. Notes would be were the project/sub-project/task notes could be added. Notes have version history and can be shared to public. Images can be embedded in the notes. Additional group tools: calendar , gantt chart, time tracking, video conference (by Zoom). Pro accounts also have workload and group chat. iOS and Android apps available. Free - 5 users/5 projects – Pro accounts are $5 per user per month (annual- 20% discount, two years – 30% discount) with unlimited projects. dropbox integration, workload views, group chat functionality and priority support.

Labguru
Project management with a specific focus on science labs. Very large number of features including: dashboard with activity feed, projects (organized into past/present/future milestones, notes with embedded and resizeable images, attachments, pubmed integration, automatic report generation), lab equipment/reagents inventory. Organizing science into milestones makes more sense than into tasks as it fits more the spirit of research versus engineering. Android and iOS apps meant to be used to follow protocols, take pictures, check storage, etc.  Overkill for a computational group.  Not very smooth as every action results in a full webpage refresh. Expensive ($12 per person/month, yearly billing).

Projecturf
Dashboard view and project view. Projects have: overview, calendar, tasks, tickets, time (could be useful for contract work or grant reporting issue), files, conversations and notes. Files can be integrated with Google drive and dropbox. Notes can have embedded images. Pricing starts at 5 projects, 5GB $20/month up to unlimited projects, 100GB for $200/month (1 month free for annual billing). Very directed towards engineering code based projects.

Summary
My favourites at this point are Basecamp, Teambox and Evernote. Evernote is clearly lacking as as group tool but has a nice focus on notebooks (as in lab notebooks). Basecamp is more polished and intuitive than Teambox but is missing a proper "notebook" within each project and is somewhat expensive. Teambox is not as well designed as Basecamp but should work well, is cheaper and has integration with Google drive. 

Saturday, October 19, 2013

Scientific Data - ultimate salami slicing publishing

Last week a new NPG journal called Scientific Data started accepting submission. Although I discussed this new journal with colleagues a few times I realized that I never argued here why I think this a very strange idea for a journal. So what is Scientific Data ? In short it is a journal that publishes metadata for a dataset with data quality metrics. From the homepage:
Scientific Data is a new open-access, online-only publication for descriptions of scientifically valuable datasets. It introduces a new type of content called the Data Descriptor designed to make your data more discoverable, interpretable and reusable.
So what does that mean ?  Is this a journal for large scale data analysis ? For the description of methods ? Not exactly. Reading the guide to authors we can see that an article "should not contain tests of new scientific hypotheses, extensive analyses aimed at providing new scientific insights, or descriptions of fundamentally new scientific methods". So instead one assumes that this journal is some sort of database where articles are descriptors of the data content and data quality. The added value of the journal would be to store the data and provide fancy ways to allow for re-analysis. That is also not the case since the data is meant to be "stored in one or more public, community recognized repositories". Importantly, these publications are not meant to replace and do not preclude future research articles that make use of these data. Here is an example of what these articles would look like. This example more likely represents what the journal hopes to receive as submissions so let's see how this shapes up in a year when people try to test the limits of this novel publication type.

In summary, articles published by this journal are mere descriptions of data with data quality metrics. This is the same information that any publication already should have except that Scientific Data articles are devoid of any insight or interpretation of the data. One argument in favor of this journal would be that this is a step into micro-publication and micro-attribution in science. Once the dataset is published anyone, not just the producers of the data, can make use of this information. A more cynical view would be that NPG wants to squeeze as much money as they can from scientists (and funding agencies) by promoting salami slicing publishing.

Why should we pay $1000 for a service that does not even handle data storage ? That money is much better spent supporting data infrastructures (disclaimer: I work at EMBL-EBI). There is no added value from this journal that is not or cannot be provided by data repository infrastructures. Yet, this journal is probably going to be a reasonable success since authors can essentially publish their research twice for an added $1000. In fact, anyone doing a large-scale data driven project can these days publish something like 4 different papers: the metadata, the main research article, the database article and the stand-alone analysis tool that does 2% better than others. I am not opposed to a more granular approach to scientific publication but we should make sure we don't waste money in this process. Right now I don't see any incentives to limit this waste nor any real progress in updating the way we filter and consume this more granular scientific content.


Monday, September 23, 2013

Single-cell genomics: taking noise into account

Technical variation versus average read counts
Reprinted by permission from Macmillan Publishers Ltd
 Nat Methods, advance online (doi:10.1038/nmeth.2645)
Sequencing throughput and amplification strategies have improved to a point where single cell sequencing has become feasible.  There was a recent review in Nat Rev Gen covering the progress in single cell genomics and some of its potential applications that is worth a read.  However, the required amplification steps are likely to introduce significant variation for small amounts of starting material. A group of investigators from the EBML-Heidelberg, EMBL-EBI and the Sanger had a look at this problem and developed an approach to quantify and account for such technical variability. The method is described in a paper that is now in press and makes use of spiked-ins to estimate technical variation across a range of different mean expression strengths (see Figure). As with most of these short communications a lot of work is included in supplementary materials, including a detailed R workflow description that should allow anyone to recreate the main figures from the paper.

This paper is a starting point for more things to come. It is focused on the method and there is clearly a lot of biological findings to be made from those data. More broadly, the Sanger and the EMBL-EBI have recently set up a joint single cell genomics centre to acquire an develop the required technology. From the EBI side this is headed by Sarah Teichmann (also affiliated with Sanger) and John Marioni. Unfortunately, for my interests in post-translational regulation, single-cell proteomics is still lagging way behind. The Cytof comes closest but still requires antibodies for detection.



Tuesday, July 02, 2013

Interdisciplinary EMBL postdoc fellowship in genome evolution and chemical-biology

The EMBL Interdisciplinary Postdocs (EIPOD) program is now accepting applications (deadline 12 of September). This program funds interdisciplinary research projects between different units of the EMBL. Applicants are encouraged to discuss self-defined project ideas with EMBL scientists or select up to two project ideas available at the EIPOD website. 

One of the project ideas listed this year is for a joint project between our group (EMBL-EBI) and the group of Nassos Typas at the EMBL Genome Biology Unit in Heidelberg. Here is a short description of project idea, entitled "Modeling genotype-to-phenotype relationships in a bacterial cell":
Understanding how phenotypic variability originates from mutations at the level of the DNA is one of the fundamental problems in biology. Sequencing of genomes for multiple individuals along with rich phenotypic profiling data allows us to pose the question of how the sum of mutations in each individual genome results in the observed phenotypic differences. The goal of this project is to develop computational methods to predict the consequences of mutations and gene-content variation on fitness in different conditions for different strains of E. coli.
The Typas group develops high-throughout approaches to study gene function via chemical-genetics and genetic-interaction screening. Previous publications and current research interests are listed in the group webpage. Our group is generally interested in studying the evolution of cellular interaction networks and in this context is interested in understanding how mutations and gene-content variation results in phenotypic consequences for different individuals.

Potential applicants are encouraged to get in touch to discuss a project proposal that relates to this topic. We are particularly keen on applicants with previous experience in any of the following: chemical-informatics, chemical-biology, protein and genome evolution, sequence/structural based prediction of effect of mutations, bacterial pan-genome studies.

Saturday, June 08, 2013

Doing away with scientific journals

I got into a bit of an argument with Björn Brembs on twitter last week because of a statement I made in support of professional editors.  I was mostly saying that professional editors were no worse than academic editors but our discussion went mostly into the general usefulness of scientific journals. Björn was arguing his positions that journal rankings in the form of the well known impact factor are absolutely useless. I was trying to argue that (unfortunately) we still need journals to act as filters. Having a discussion on Twitter is painful so I am giving my arguments some space in this blog post.

Björn arguments are based on this recently published review regarding the value of journal ranking (see paper and his blog post). The one line summary would be:
"Journal rank (as measured by impact factor, IF) is so weakly correlated with the available metrics for utility/quality/impact that it is practically useless as an evaluation signal (even if some of these measures become statistically significant)."
I covered some of my arguments before regarding the need of journals for filtering here and here. In essence I think we need some way to filter through the continuous stream of scientific literature and the *only* current filter we have available is the journal system. So lets break this argument in parts. Is it true that :  we need filters;  journals are working as filters; there are no working alternatives ?

We need filters

I hope that few people will try to argue that we have no need for filters in scientific publishing. On pubmed there are 87551 abstract entries for May which is getting close to 2 papers per minute. It is easy to see that the rate of publishing is not going down any time soon. All current incentives on the author and publishing side will keep pushing this rate up. One single unfiltered feed of papers would not work and it is clear we need some way to sort out what to read. The most immediate way to sort would be on topics. Assuming authors would play nice and not try to tag their papers as broadly as possible (yeah right) this would still not solve our problem. For the topics that are very close to what I work on I already have feeds with fairly broad pubmed queries that I go through myself. For topics that might be one or several steps removed from my area of work I still want to be updated on method developments and discoveries that could have an impact on what I am doing.  I already spend an average of 1 to 2 hours a day scanning abstracts, I don't want to increase that.

Journals as filters

If you follow me this far then you might agree that we need filtering processes that go beyond simple topic tagging. Without even considering journal "ranking", the journals already do more than topic tagging since journals are also communities that form around areas of research. To give a concrete example both Bioinformatics and PLOS Computational Biology publish papers in bionformatics but while the first tends to publish more methods papers the latter tends to publish more biological discoveries. Subjectively I tend to prefer the papers published in the PLOS journal due to its community  and that has nothing to do perceived impact.

What about impact factors and journal ranking ? In reviewing the literature Björn concludes that there is almost no significant association between impact factors and future citations. This is not in agreement with my own subjective evaluation of different journals I pay attention to. To give an example, the average paper on journals of the BMC series are not the same to me as average papers published in Nature journals. Are there many of you that have a different opinion ? Obviously, this could just mean that my subjective perception is biased and incorrect. This would mean also that journal editors are doing a horrible job and the time they spend evaluating papers is useless. I have worked as an editor for a few months and I can tell you that it is hard work and it is not easy to imagine that it is all useless work. In his review Björn points to, for example, the work by Lozano and colleagues. In that work the authors correlated the impact factor of the journal with future citations of each paper in a given year. For biomedical journals the coefficient of determination has been around 0.25 since around 1970. Although the correlation between impact factor and future citations is not high (r ~ 0.5) it is certainly highly significant given that they looked at such large numbers (25,569,603 articles for biomed). Still this also tell us that evaluating the impact/merit of an individual publication by the journal it is published prone to error. However, what I want to know is: given that I have to select what to read, do I improve my chances of finding potentially interesting papers by restricting my attention to subsets of papers based on the impact factor ?

I tried to get my hands on the data used my Lozano and colleagues but unfortunately they could not give me the dataset they used. Over email, Lozano said I would have to pay Thomson Reuters on the order of $250,000 for access (not so much reproducible research). I wanted to test the enrichment over random of highly versus lowly cited papers in relation to impact factors. After a few other emails Lozano pointed me to this other paper where they calculated enrichment for a few journals in their Figure 4, which I am reproducing here under a Don't Sue me licence. For these journals they  calculated the fraction of 1% top most cited papers divided by the fraction of top 1% cited papers across all papers. This gives you an enrichment over random expectation that for journals like Science/Cell/Nature turns out to be around 40 to 50. So there you go, high impact factors, on average, tend to be enriched in papers that will be highly cited in the future.

As an author I hate to be evaluated by the journals I publish in instead of the actual merit of my work. As a reader I admit to my limitations and I need some way to direct my attention to subsets of articles. Both the data and my own subjective evaluation tells me that journal impact factors can be used as way to enrich for potentially interesting articles.

But there are better ways ... 

Absolutely ! The current publishing system is a waste of everyone's time as we try to submit papers down a ladder or perceived impact.  The papers get reviewed multiple times in different journals, reviewers think that articles need to be improved with year long experiments and discoveries stay hidden in this reviewing limbo for too long. We can do better than this but I would argue that the best way to do away with the current journal system is to replace it with something else.  Instead of just shouting for the destruction of journal hierarchies and the death of the impact factor talk about how you are replacing it.  I try out every filtering approach I can find and I will pay for anything that works well and saves me time. Google Scholar has a reasonably good recommendation system and it is great to see people developing applications like the Recently app. PLOS is doing a great job of promoting the use of article level metrics that might help others to build recommendation systems. There is work to do but the information and technology for building such recommendation systems is all out there already. I might even start using some of my research budget to work on this problem just out of frustration. I have some ideas on how I would go about this but this blog post is already long. If anyone wants to chat about this drop me a line. At the very least we can all start using preprint servers and put our work out before we bury it for a year in the publishing limbo.


  

Monday, May 13, 2013

EBI-Sanger postdoctoral fellowship on Plasmodium kinase regulatory networks

I am happy to announce a call for applications for a EBI-Sanger postdoctoral fellowship to study the kinase regulatory networks in Plasmodium. This is one of four currently open calls in the the EBI–Sanger Postdoctoral (ESPOD) Programme and the call closes on the 26th of July. This interdisciplinary programme is meant to foster collaborations between the EBI and the Wellcome Trust Sanger Institute, both at the Genome Campus near Cambridge UK. Our project is a collaboration between myself (EBI), Jyoti Choudhary (mass-spectrometry group leader at Sanger) and Oliver Billker (group leader at Sanger studying malaria parasites). The postdoctoral fellow would have the opportunity to work at the interface between bioinformatics, mass-spectrometry (MS) and Plasmodium biology. A description of the project can be found online (PDF) but briefly the objective is to characterize kinase regulatory network of the malaria parasite by combining quantitative phosphoproteomics with computational analysis. There will be a strong emphasis on the computational analysis of the MS data and some prior computational experience is a plus. The ideal candidate would have prior experience in phosphoproteomics with a strong interest in learning the computational aspects required or prior experience in the relevant computational skills and interest in learning/performing some of the experimental work. Feel free to contact me if you require more information about the project or the ESPOD fellowship. 

Sunday, April 07, 2013

The case for article submission fees

For scientific journal articles the cost of publishing is almost exclusively covered by the articles that are accepted for publication. Either by the published authors or by the libraries. Advertisement and other items like the organization of conferences are probably not a very significant source of income. I don't want to argue here again the value of publishers and how we should be decoupling the costs of publishing (close to zero) from peer-review, accreditation and filtering. Instead I just want to explore the idea for a very obvious form a income that is not used - submission fees. Why don't journals charge all potential authors a fixed cost per submission, even if the article ends up being rejected ?  I am sure publishers have considered this option and they have reached the conclusion that this is not viable. I would like to know why and maybe someone reading this can give a strong argument against. Hopefully someone from the publishing side that has crunched the numbers.

The strongest reason against that I can imagine would be a reduction in submission rates. If only some publishers adopt this fee authors will send their papers to journals that don't charge for submission. Would the impact be that significant ? For journals with high-rejection rates this might even be useful since it would preferentially deter authors that are less confident about the value of their work. For journals with lower rejection rates the impact of the fee would be small since authors are less concerned with a rejection. Publishers might even benefit from implementing a submission charge in the from of a lock-in effect if they do not charge when transferring articles between their journals. Publishers already use this practice of transferring articles and peer-review comments between their journals. It already functions as a form of lock-in since authors, wishing to avoid another lengthy round of peer-review, will tend to accept. If the submission fee is only charged once the authors are even more likely to keep the articles within the publisher. Given the current trend of publishers trying to own the full stack of high-to-low rejection rate journals these lock-in effects are going to be increasingly valuable.

The overall benefit would be an increased viability of open access. A submission fee might also accelerate the decoupling of peer-review from the act of publishing. If we get used to paying separately for publishing and for submission/evaluation we might get used to having these activities performed by different entities. Finally,  if it results also in less slicing into ever smaller publishable units we might all benefit.


Update: Anna Sharman sent me a link to one of her blog posts where she covers this topic in much more detail.

Photo adapted from: http://www.flickr.com/photos/drh/2188723772/

Tuesday, April 02, 2013

Benchmark the experimental data not just the integration

There was a paper out today in Molecular Systems Biology with a resource of kinase-substrate interactions obtained from in-vitro kinase assays using protein micro-arrays. It is clear that there is a significant difference between what a kinase regulates inside a cell and what it could phosphorylate in-vitro given appropriate conditions. In fact, reviewer number 1 in the attached comments (PDF), explains at length why these protein-array based kinase interactions may be problematic. The authors are aware of this and integrate the protein-array data with additional data sources to derive a higher confidence dataset of kinase interactions. The authors then provide computational and experimental benchmarks of the integrated dataset. What I have an issue with is that the original protein-array data itself it not clearly benchmarked in the paper. How are we to know what is the contribution of that feature and all of the hard experimental work for the final integrated predictor ?

A very similar procedure was used in a recent Cell paper paper where co-complex membership was predicted based on the elution profiles of proteins detected by mass-spectrometry. Here again, the authors do not present benchmarks of the interactions predicted solely on the co-elution data. Instead they integrate it with around 15 other features before evaluating and studying the final result. In this case, they have in supplementary material some indirect indication of the value of the experimental data by itself by providing the rank each feature has in the predictor.

I don't think the papers are incorrect. In both cases the authors provide an interesting final result with the integrated set of interactions benchmarked and analysed. However, in both cases, we are unsure of the value of the experimental data that is presented. I don't think it is an unreasonable request. There are many reasons why this information should be clearly presented before additional data integration steps are used. At the very least this is important for other groups thinking about setting up similar experimental approaches.



Thursday, March 28, 2013

The glacial pace of innovation in scientific publishing


Nature made available today a collection of articles about the future of publishing. One of these is a comment by Jason Priem on "Scholarship: Beyond the paper". It is beautifully written and inspirational. It is clear that Jason has a finger on the pulse of the scientific publishing world and is passionate about it. He sees a future of a "decoupled" journal, where modular distributed data streams can be built into stories openly and in real time. Where certification and filtering are not tied to the act of publishing and can happen on the fly by aggregating social peer review. While I was reading I could not contain a sigh of frustration. This is a future that several of us like Neil and Greg debated at Nodalpoint many years ago. Almost 7 years ago I wrote in a blog post:

"The data streams would be, as the name suggests, a public view of the data being produced by a group or individual researcher.(...) The manuscripts could be built in wikis by selection of relevant data bits from the streams that fit together to answer an interesting question. This is where I propose that the competition would come in. Only those relevant bits of data that better answer the question would be used. The authors of the manuscript would be all those that contributed data bits or in some other way contributed for the manuscript creation. (...) The rest of the process could go on in public view. Versions of the manuscript deemed stable could be deposited in a pre-print server and comments and peer review would commence."

I hope Jason wont look back some 10 years from now and feel the same sort of frustration I feel now with how little scientific publishing has changed. So what happened in the past 7 years ? Not much really. Nature had an open peer review trial with no success. Publishers were slow to allow comments on their websites and we have been even slower at making use of them. Euan had a fantastic science blog/news aggregator (Postgenomic) but it did not survive long after he went to Nature. Genome Biology and Nature both tried to create pre-print servers for biomed authors but ended up closing them for lack of users. We had a good run at an online discussion forum with Friendfeed (thank you Deepak) before Facebook took the steam out of that platform. For most publishers we can't even know the total number of times an article we wrote has been seen, something that blog authors have taken for granted for many years. Even some cases where progress has been made, it has taken (or is taking) way too long. The most obvious example is the unique author id where after many (oh so many) years there is a viable solution in sight.  All that said, some progress was made in the past few years. Well, mainly two things - PLOS One and Twitter.

Money makes the world go round

PLOS One had a surprising and successful impact in the science publishing world. Its initial stated mission was to change the way peer review was conducted. The importance of a contribution would be judged by how readers would rate or comment on the article. Only it turns out that few people take the time to rate or comment on papers. Nevertheless, thanks to some great management, first by Chris Surridge and then by Peter Binfield, PLOS One was a huge hit as an novel, fast, open access (at a fair price) journal.  PLOS One, catch-all approach saw a steady increase in number of articles published (and very healthy profits) and got the attention of all other publishers.

If open-access is suitable as a business model then funding sources might feel that is OK to mandate immediate open-access. If that were to happen then only publishers with a similar structure to PLOS would survive. So, to make a profit and to hedge against a mandate for open access all other publishers are creating (or buying) a PLOS One clone. This change is happening at an amazing pace. This is great for open access and it goes in the direction of a more streamlined and modular system of publishing. It is not so great for filtering and discoverability. I have said in the past that PLOS One should stop focusing on growth and go back to the initial focus on filtering and the related problem of credit attribution. To their credit they are one of the few very actively advocating for the development of these tools.  Jason, Heather, Euan and others are doing a great job of developing tools that report these metrics.

1% of the news scrolling by 

Of the different tools that scientists could have picked up to be more social Twitter was the last one I would expect to see taking off. 140 characters ?! Seriously ? How geeky is that ? No threaded discussions, no groups, some weird hashtagsomethings. It what world is this picked up by established tenured university professors that don't have time to leave a formal comment on a journal website ? I have no clue how it happened but it did. Maybe the simple interface with a single use case; the asymmetric (i.e. flattering) network structure; the fact that updates don't accumulate like email. Whatever the reason, scientists are flocking to twitter to share articles, discuss academia and science (within the 140 char) and rebel against the Established System. It is not just the young naive students and disgruntled postdocs. Established group leaders are picking up this social media megaphone. Some of them are attracting  audiences that might rival some journals so this alone might make them care less about that official seal of approval from a "high-impact" journal. 

The future of publishing ? 

So after several years of debates about what the web can do for science we have: 1) a growing trend for "bulk" publishing with no solid metrics in place to help us filter and provide credit to authors; and 2) a discussion forum (Twitter) that is clunky for actual discussions but is at least being picked up by a large fraction of scientist. Were are going from here ? I still think that a more open and modular scientific process would be more productive and enjoyable (less scooping). I am just not convinced that scientists in general even care about these things. From my part I am going to continue sharing ideas on this blog and, now that I coordinate a research group, start posting articles to arXiv. I hope that Jason is right and we will all start to take better advantage of the web for science.


Clock image adapted from tinyurl.com/cmy9fn5

Wednesday, December 26, 2012

My Californian Life

Warning: No science ahead
I am in Portugal for the holidays having just left San Francisco. It is a part of the academic life that we have to keep moving around with each new job and after Germany and the US (California) I am moving to the UK in a few days. Its not easy to keep rebuilding your roots in new places but it is certainly rewarding to experience new cultures. It has been great to spend almost 5 years in California and it was very (!) hard to leave. I decided to try to write down a few thoughts about life in the golden state. Maybe it will be useful for others considering moving there. I apologize in advance for the generalizations.


Geek heaven
It is impossible to live in silicon valley (I was in Menlo Park) without noticing the geekiness of the place. Just a few random examples: the Caltrain conductors frequently make geeky jokes ("warp speed to San Francisco"); the billboards on the 101 highway between San Francisco and San Jose often advertise products that only programmers would care about (e.g. types of databases); every time I went for a hot chocolate at the Coupa Cafe there was someone demoing or pitching an idea for a website or app. For someone that likes technology it is a great place to be. It is thrilling to find out that so many of the tech companies you read about are just around the corner. It is also very likely that the people you will meet know about the latest tech trends if they are not, in fact, actually developing them. Unfortunately, every nice thing has its downsides and there is so much money around from tech salaries and companies that everything is horribly expensive if you don't work in the tech sector yourself.

The "can do" attitude and personal responsibility

It is nearly impossible to pin-down what makes silicon valley such a breeding ground for successful companies but one thing that impressed me was their winning attitude. It is more generally an american trait and not just found in California. People often believe their ideas can succeed to a point that borders on naivety. To paint a (somewhat) exaggerated picture: it is not enough to be good, you should strive to be number one. There are many positive and negative consequences of this attitude that are not easy to unwrap. There are many obvious advantages that come with all that drive and positive thinking. This connects also to the notion of personal responsibility -  it should be up to each one of us to make our success. As a negative consequence, failure is then also our individual responsibility, even when in reality it isn't.


I don't want to go into politics but I will say that I have learned a lot also about the role of government. It was interesting to live in a place that emphasized personal responsibility so much more than Portugal. I think it served well to calibrate my own expectations of what the state should and should not be responsible for. As for many other things, I wish more people could have the experience of living in different countries and cultures.

Fitness freaks surrounded by amazing nature and obsessed with organic local food
Before going to the US I had many friends making jokes about how much weight I would gain and the stereotypical view of overweight america. In fact any generalizations of S.F. or silicon valley would have to go in the opposite direction. I had friends waking up at crazy hours to exercise and I even learned to enjoy running (yikes !). It helps that California is sunny and filled with beautiful nature like the state parks and coastline (do the California route 1). Also, the food is great although there is tiny exaggerated obsession with locally grown organic food. The constant sunshine and great food are probably the two things I will miss most when I get to the UK. I might have to buy a sun lamp. The interests for the outdoors was not that different from Germany but it is something I wish was more prevalent in Portugal. Portugal has such a nice weather and outdoors that it is a waste that we don't take better advantage of them. 

A thank-you note
Californians are amazingly friendly people. It is true that sometimes it feels superficial. In restaurants it can even be annoying when a waiter comes for the tenth time to ask if you are really enjoying your meal. Still, it was great to live there with the easy smiles and unexpected chit-chat or compliments. It was easy to feel at home and I never felt like a foreigner. As I have learned from these years of living in California, one should always send a polite thank-you note after an event. So thank you California for these wonderful years. It would be most appropriate to say that it was "awesome". 


Tuesday, November 06, 2012

Scholarly metrics with a heart


I attended last week the PLOS workshop on Article Level Metrics (ALM). As a disclaimer, I am part of  the PLOS ALM advisory Technical Working Group (not sure why :). Alternative article level metrics refer to any set of indicators that might be used to judge the value of a scientific work (or researcher or institution, etc). As a simple example, an article that is read more than average might correlate with scientific interest or popularity of the work. There are many interesting questions around ALMs, starting even with simplest - do we need any metrics ? The only clear observation is that more of the scientific process is captured online and measured so we should at least explore the uses of this information.

Do we need metrics ? What are ALMs good for

As any researcher I dislike the fact that I am often evaluated by the impact factor (IF) of the journals I publish in. When a position has hundreds of applicants it is not practical to read each candidate's research and carefully evaluate them. As a shortcut, the evaluators (wrongly) estimate the quality of a researcher's work by the IFs of the journals. I wont discuss the merit of this practice since even Nature journal has spoken out against the value of IFs. So one of the driving forces behind the development of ALMs is this frustration with the current metrics of evaluation.  If we cannot have a careful peer evaluation of our work then the hope is that we can at least have better metrics that reflect the value/interest/quality of our work. This is really an open research question and as part of the ALMs meeting, PLOS announced a PLOS ONE collection of research articles on ALMs. The collection includes a very useful introduction to ALMs by Jason Priem, Paul Groth and Dario Taraborelli.

Beyond the need for evaluation metrics ALMs should also be more broadly useful to develop filtering tools. A few years ago I noticed that articles that were being bookmarked or mentioned in blog posts had an above average number of citations. This has now being studied in much detail. Even if you are not persuaded by the value of quantitative metrics (number of mentions, PDF downloads, etc) you might be interested instead in referrals from trust-wordy sources. ALM metrics might be useful by tracking the identity of those reading, downloading, bookmarking an article. There are several researchers I follow on social media sites because they mention articles that I consistently find interesting. In relation to identity, I also learned in the meeting that ORCID author ID initiative has finally a (somewhat buggy) website that you can use to claim an ID. Also, ALMs might be useful for filtering if they can be used, along with natural language processing methods, to improve automatic classification of an articles' topic. This last point, on the importance of categorization, was brought up in the meeting by Jevin West who had some very interesting ideas on the topic (e.g. clustering, automatic semantic labeling, tracking ideas over time). If the trend for the growth of mega-journals (PLOS ONE, Scientific Reports, etc) continues, we will need these filtering tools to find the content that matters to us.

Where are we now with ALMs ? 

In order to work with different metrics of impact we need to be able to measure them and these need to made available. From the publishers side PLOS has lead the way in making several metrics available through an API and there is some hope that other publishers will follow PLOS. Nature for example has recently made public a few of the same metrics for 20 of their journals although, as far as I know, they cannot be automatically queried. The availability of this information has allowed for research on the topic (see PLOS ONE collection) and even the creation of several companies/non-profit that develop ALM products (Altmetrics, ImpactStory, Plum Analytics, among others). Other established players have also been in the news recently. For example, the reference management tool Mendeley has recently announced that they have reached 2 million users whose actions can be tracked via their API and Springer announced the acquisition of Mekentosj, the company behind the reference manager Papers. The interest surrounding ALMs is clearly on the rise as publishers, companies and funders try their best to gauge the usefulness of these metrics and position themselves to have an advantage in using them.

The main topics at the PLOS meeting

It was in this context that we got together in San Francisco last week. I enjoyed the meeting format with  a mix of loose topics but strict requirements for deliverables. It was worth attending even just for that and the people I met. After some introductions we got together in groups and quickly jotted down in post-its the sort of questions/problems we though were worth discussing. The post-its were clustered on the walls by commonality and a set of broad problem sets were defined (see the list here).

Problems for discussion included:

  • how do we increase awareness for ALMs ?
  • how to prevent the gaming (i.e. cheating to increase the metrics of my papers) ?
  • what can be and is worth measuring ?
  • how to exchange metrics across providers/users (standards) ?
  • how to give context/meaning/story to the metrics ?

We were then divided into parallel sessions where we further distilled these problems into more specific action lists and very concrete steps that can be taken right now.

Metrics with a heart

From my own subjective view of the meeting it felt like we spent a considerable amount of time discussing how to give more meaning to the metrics. I think it was Ian Mulvany who wrote in the board in one of the sessions: "What does 14 mean ?". The idea of context came up several times and from different view points. We have some understanding of what a citation means and from our own experience we can make some sense of what 10 or 100 citations mean (for different fields etc). We lack a similar sense for any other metric. As far as I know, ImpactStory is the only one trying to give context to the metrics shown by comparing the metrics of your papers with random sets of the same year. Much more can be done along these same lines. We arrived at a similar discussion from the point of view of how we present ourselves as researchers to the rest of the world. Ethan Perlstein talked about how engaging his audience through social media and giving feedback on how his papers were being read and mentioned by others was enough to tell a story that increased interest for his work. The context and story (e.g. who is talking about my work) is more important than the number of views. We reached again to the same sort of discussions when we talked about tracking and using the semantic meaning or identity/source of the metrics. For most use cases of ALMs we can think of we would benefit or downright need more context and this is likely to drive the next developments and research in this area.

The devil we don't know

Heather Piwowar asked me at some point if I had any reservations about ALMs. In particular from the point of view of evaluation (and to a lesser extent filtering) it might turn out that we are substituting a poor evaluation metric (journal impact factor) by an equally poor evaluation criteria - our capacity to project influence online. In this context it is interesting to follow some experiments that are being done in scientific crowdfunding. Ethan Perlstein has one running right now with a very catchy tittle: "Crowdfund my meth lab, yo". Success in crowdfunding should depend mostly on the capacity to project your influence or "brand" online. An exercise in personal marketing. Crowdfunding is an extreme scenario where researchers are trying to side-step the grant system and get funding directly from the wider public. However, I fear that evaluation by ALMs will tend to reward exactly the sort of skills that relate to online branding. Not to say that personal marketing is not important already, this is why researchers network in conferences and get to know editors, but ALMs might reward personal (online) branding to an even higher level.

Thursday, July 19, 2012

I am starting a group at the EMBL-EBI


I signed the contract this week to start a research group at the European Bioinformatics Institute (EBI) in Cambridge, an outstation of the European Molecular Biology Laboratory (EMBL). After blogging my way through a PhD and postdoc it is a pleasure to be able to write this blog post. In January, I will be joining an amazing group of people working in bioinformatics services and basic research where I plan to continue studying the evolution of cellular interaction networks. I am currently interested in the broad problem of understanding how genetic variability gets propagated through molecules and interaction networks to have phenotypic consequences. The two main research lines of the group will continue previous work on the evolution of protein interactions and post-translational modifications (see post) and the evolution of chemical genetics/personal genomics (see post 1 and post 2).  I will continue to use this blog to discuss research ideas and on-going work and as always the content here reflects my own personal views and not of my current/future employers.

I take also the opportunity to mention that I am looking for a postdoc to join the group in January to work on one of the two lines described above.  If you know anyone that might be crazy adventurous interested please send them a link to this post. Past experience (i.e. published work) in computational analysis of cellular interaction networks is required (ex. post-translational modifications, mass-spectrometry, linear motif based interactions, structural analysis of interaction networks, genetic-interactions, chemical-genetics, drug-drug interactions, comparative genomics, etc). The work will be done in collaboration with experimental groups in the EMBL-Heidelberg and the Sanger. Pending approval from EMBL, a formal application announcement will appear in the EMBL jobs page.

I wanted to also share a bit of my experience of trying to get a job after the postdoc “training” period. I have ranted in the past sufficiently about how many problems the academic track system has but the current statistics are informative enough. About 15% of biology related PhDs get an academic position within 5 years. The competition is intense and in the past year and a half I have applied to 15 to 20 positions before taking the EBI job. On a positive note, I had the impression that better established places actually cared less about impact factors. Nevertheless, it has been a super stressful period and even so I know I have been very lucky.  I don’t mean this in any superstitious way but really in how statistically unlikely it is to have supportive supervisors and enough positive research outcomes (i.e. impactful publications) to land a job. I think we are training too many PhDs (or not taking advantage of the talent pool) and the incentives are not really changing. From my part I will try my best to contribute to changing the incentives behind this trend. We could have less funding for PhD in bio related areas, smaller groups and more careers within the lab besides the group leader.  At least, PhDs should be aware and train for alternative science related paths. 

Evolution and Function of Post-translational Modifications

A significant portion of my postodoctoral work is finally out in the last issue of Cell (link to paper). In this study we have tried to assign a function to post-translational modifications (PTMs) that are derived from mass-spectrometry (MS). This follows directly from previous work where we looked at the evolution of phosphorylation in three fungal species (paper, blog post). We (and other groups) have seen that phosphorylation sites diverge rapidly but we don't really know if this divergence of phosphosites results in meaningful functional consequences. In order to address this we need to know the function of post-translational modifications (if they have any). Since these MS studies now routinely report several thousand PTMs per analysis we have a severe bottleneck in the functional analysis of PTMs. These issues are the motivations for this last work. We collected previously published PTMs (close to 200.000) and obtained some novel ubiquitylation sites for S. cerevisiae (in collaboration with Judit Villen's lab).  We revisited the evolutionary analysis and we set up a couple of methods to prioritize those modifications that we think are more likely to be functionally important.
As an example, we have tried to assign function to PTMs by annotation those that likely occur at interface residues. One approach that turned out to be useful was to look for conservation of the modification sites within PFAM domain families. For example, in the figure above and under "Regulation of domain activity", I am depicting a kinase domain. Over 50% of the phosphorylation sites that we find in the kinase domain family occur in the well known activation loop (arrow), suggestion that this is an important regulatory region. We already know that the activation loop is an important regulatory region but we think that this conservation approach will be useful to study the regulation of many other domains. In the article we give other examples and an experimental validation using the HSP70 domain family (in collaboration with the Frydman lab).

I won't describe in detail the work as you can (hopefully) read the paper. Leave a comment or send me an email if you can't and/or if you have any questions regarding the paper or analysis. I also put up the predictions in a database (PTMfunc) for those who want to look at specific proteins. It is still very alpha, I apologize for the bugs and I will try to improve it as quickly as possible. If you want access to the underlying data just ask and I'll send the files. I am also very keen on collaborations with anyone collecting MS data or interested in the post-translational regulation of specific proteins, complexes or domain families.

Blogging and open science
Having a blog means I can give you also some of the thoughts that don't fit in a paper or press release. You can stop reading if you came for the sciency bits. One of the cool things I realized was that I have discussed in this blog three papers in the same research line, that run through my PhD and postdoc. It is fun to be able to go back not just to the papers but to the way I was thinking about these ideas at the time. Unfortunately, although I try to use this blog to promote open science this project was yet-another-failed open science project. Failed in the sense that it started with a blog post and a lot of ambition but never gained any momentum as an online collaboration. Eventually I stopped trying to push it online and as experimental collaborators joined the project I gave up on the open science side of it. I guess I will keep trying whenever if makes sense. This post closes project 1 (P1) but if you are interested in online collaborations have a look at project 2 (P2).

Publishable units and postdoc blues
This work took most of my attention during the past two years and it is probably the longest project I have worked on. Two years is not particularly long but it has certainly made me think about what is an acceptable publishable unit. As I described in the last blog post, this concept is very hard to define. While we probably all agree that a factoid in a tweet is not something I should put on my CV we allow and even cheer for publishing outlets that accept very incremental papers. The work I described above could have easily been sliced into smaller chunks but would it have the same value ? We would have put out the main ideas much faster but it could have been impossible to convince someone to test them. I feel that the combination of the different analysis and experiments has more value as a single story but an incremental approach would have been more transparent. Maybe the ideal situation would be to have the increments online in blogs, wikis and repositories and collect them in stories for publication. Maybe, just maybe, these thoughts are the consequence of postdoc blues. As I was trying to finish and publish this project I was also jumping through the academic track hoops but I will leave that for a separate post.