Tuesday, March 27, 2007

Google Base API

While we are waiting for freebase to give us a chance to preview their service we can go ahead and try something that probably is very similar in spirit to freebase. Google Base has been up a long time but only recently have they opened it up for automatic access (see Google Base API). There are some restrictions but in the end we can think of it as a free online database that we can use remotely.

How easy is it to use ? If you like Java, C# or PHP you are in luck because they have client libraries to help you get started.

I also found this Google Base code in CPAN and decided to give this a try. After reading some of the information in the API website and having a look at the code it comes down to 3 main tasks: 1)authentication; 2)insert/delete/update; 3)query

Having installed the above mentioned CPAN module the authentication step is easy:

use WWW::Google::API::Base;
my $api_user = "username"; #Google user name
my $api_pass = "pass"; #Google pass
my $api_key = "API_KEY";
#any program using the API must get a key


my $gbase = WWW::Google::API::Base->new(
{ auth_type => 'ProgrammaticLogin',
api_key => $api_key,
api_user => $api_user,
api_pass => $api_pass },
{ } );

That's it, $gbase is authorized to use that google account in Gbase.

Now to insert something useful in the database requires a bit more effort. The CPAN module comes with an example on how to insert recipes. I am not that great a cook so I created a new function in Base.pm that comes with the module. I called it insertSequence

sub insertSequence {
my $self= shift;
my $id = shift;
my $seq_string = shift;
my $seq_type = shift;
my $spp = shift;
my $l=shift;
$self->client->ua->default_header('content-type',
'application/atom+xml');
my $xml = <<EOF;

<?xml version='1.0'?>
<entry xmlns='http://www.w3.org/2005/Atom'
xmlns:g='http://base.google.com/ns/1.0'
xmlns:c='http://base.google.com/cns/1.0'>
<author>
<name>API insert</name>
</author>
<category scheme='http://www.google.com/type' term='googlebase.item'/>
<title type='text'>$id</title>
<content type='text'>$seq_string</content>
<g:item_type>sequence</g:item_type>
<g:spp type='text'>$spp</g:spp>
<g:length type='int' >$l</g:length >
<g:sequence_type type='text'>$seq_type</g:sequence_type>
</entry>

EOF

my $insert_request = HTTP::Request->new(
POST => 'http://www.google.com/base/feeds/items',
$self->client->ua->default_headers,
$xml);
my $response;
eval {
$response = $self->client->do($insert_request);
};
if ($@) {
my $error = $@;
die $error;
}
my $atom = $response->content;
my $entry = XML::Atom::Entry->new(\$atom);
return $entry
}


The function takes in information on the sequence like the ID, the sequence string, type , species, length and creates and XML entry to submit to Google Base according to the specifications they provide in the website. In this case it will be an entry of type "sequence" (that is non standard for GBase). The only detail in this was that I could not get the sequence string into an item attribute of type text because there seams to be a size limit in these. This is why the sequence is in the description.

Ok, with this new function adding a sequence to the database is easy. After the authentication code as above we just need to do:
$gbase->insertSequence($simple ,$seq_str,
"protein","s.cerevisiae",$l);

After getting the information from somewhere to populate the variables. According to the Google API faq, there is a limit of 5 queries per second. In about 25 lines we can get a FASTA to GBase pipe. Here is an example of protein sequence in Gbase (it might get deleted in time).

Now I guess one of the interesting parts is that we can use Google to filter results using the Google Base query language. The CPAN module above already has a query tool. It is still very simple but it gets the results of a search into an ATOM object. Here is a query that returns items from S.cerevisiae that have length between 350 and 400:
my $new_id="http://www.google.com/base/feeds/items
?bq=[spp:s.cerevisiae][length(int) : 350..400]";
my $select_inserted_entry;
eval {
$select_inserted_entry =$gbase->select($new_id);
print $select_inserted_entry->as_xml;# The output in XML format
};
if ($@) {
my $e = $@;
die $e->status_line; # HTTP::Response
}
I am not sure yet if these items are available to other users to query not the code that would do it. I think this example here only gets the items in my account. This was as far as I got with this. The last step would be to have an XML parser turn the returned ATOM object into something more useful.