Cellular Consequences of Genetic variation: mash-ups

Showing posts with label mash-ups. Show all posts

Monday, July 09, 2007

Filtering papers on number of downloads

I was having a look at highly accessed papers for BMC Bioinformatics. In BMC, all journals have a page with the statistics of the most highly accessed papers of the last month. Several other journals now provide a similar service. The cool think about BMC is that they even tell you how many views per paper (sum of abstract, full text and PDF accesses on BioMed Central in the last 30 days). Not only that, the information in on the RSS feed they provide. That makes it very easy to feed into a pipe and have a threshold for number of views above which it will show up on the filtered feed.

Here is pipe example to filter out BMC Bioinformatic papers below 1000 views. The only problem is that the information is not stored as a number (example :"Number of accesses: 1226"). That is why I used a regular expression [1-9][0-9][0-9][0-9]$ instead of number filtering. I also don't know if the numbers are updated everyday .. but I hope so.

Even better would be to have some kind of service that given a DOI BMC would provide exactly this information structure. If other repositories provide a similar service then there is no point in worrying about the dilution in the number of page views because of open access because we could just sum views in the publishers site with Pubmed Central, etc.

Wednesday, July 04, 2007

RSS feed for BiomedCentral comments

As if there is not enough things to read this days I though it would be interesting to provide an RSS feed for BiomedCentral comments. I tried to use openkapow to scrape the information from the webpage but for some reason the feed only worked a couple of times after being published. Instead I used Dapper that amazingly enough produced a more stable feed. The full, unfiltered feed can be found here.

The feed includes the title (with a URL to the comment page, where there is a DOI to the cited paper), the short description provided in the main webpage and the journal (saved in the date). The feed can be filtered for particular journals using this simple pipe from Yahoo pipes that is currently set for BMC Genomics or BMC Bioinformatics.

Friday, March 16, 2007

Bioinformatic web scraping/mash-ups made easy with kapow

In bioinformatics it is common that we might need to use a web service multiple times. Ideally, whoever built the web service provided a way to automatically query the site via an API. Unfortunately, Lincoln Stein's dream of a bioinformatics nation is still not a reality. When there is no programmable interface available and the underlying database information is not available it's usually necessary to write some code to scape the content from the web service.

In come openKapow, a free tool to (easily) build and publish robots to turn any website into a real web service. To illustrate how easy it is to use it I have built a Kapow robot to get, for any human geneID, a list of orthologs (with species and IDs). I downloaded the robotmaker and tried it on the Ensembl database. To be fair Ensembl is probably one of the best bioinformatics resources with available API and easy data mining tools like Biomart. This was just to give an example.

You start the robot by defining the initial webpage and the service inputs and outputs. I decided to create a REST service that would take an Ensembl gene ID and output pairs of gene ID/species name. The robotmaker application is intuitive to use for anyone with a moderate experience with HTML. The robot is created by setting up the steps that should occur to transform the input into the desired output. For example, we have to define were the input should be entered by clicking on the search box:

From here there are a set of loops and conditional statements that you can include to get the list of orthologs:

We can run through the robot steps with a test input and debug it graphically. Once the robot is running it is possible to host it on the openKapow web page, apparently also free of charge. Here is the link for this simple robot (this link might go down in the future). Of course it is also possible to build new robots that use robots that are published on openKapow. Also this example uses a single webpage but it would be more interesting to use this to mash up different services together.