Friday, March 29, 2013

JSON processing functions and the new JSON parser API

Today, with a great sense of relief, I committed the second of my major JSON patches for this release cycle, just beating the imminent close of the current commit-fest.

Most of the attention has focussed on the new JSON processing functions, which you can see details of in the dev docs. But the most important part in my mind is really the less visible part, the scaffolding on which these new functions were build. Essentially this took the original (and very good, for its purpose) JSON parser that Robert Haas wrote for release 9.1, which was stack based, and turned it into a classic recursive descent parser. In doing that it provided the ability of functions calling the parser to supply callback hooks for certain parsing events, such as the beginning or end of a JSON object or array. It's kind of the JSON equivalent of an XML SAX parser. That means it's quite easy to construct new JSON processing functions without having to worry at all about the nitty gritty of JSON parsing. I expect to see a number of JSON processing extensions based in this (and I expect to write a few myself).

I'm hoping to give a talk at a conference later in the year about how to build these processing functions.

Sunday, March 24, 2013

Parallel pg_dump committed

Today I committed Joachim Wieland's  patch to add a parallel option to pg_dump when outputting directory format. Barring some unexpected and evil occurrence, this should be in the forthcoming 9.3 release. This is the culmination of several years of work. I started talking about this shortly after I did the parallel pg_restore work for release 8.4.

This was a lot more difficult and complex than parallel pg_restore. The three pieces that were required to make this work were: an archive format that we could sensibly write to in parallel, a way of making sure that a dump done in parallel was consistent, and then the actual code to use those features to achieve parallel dumping. The first piece was directory archive format, that was introduced with Release 9.1. The second piece was snapshot cloning, that was introduced in Release 9.2. This final piece builds on those two earlier pieces of work to make parallel dumps a reality.

This development throws away some of my earlier work, We now have a common parallel processing infrastructure that is used for both pg_dump and pg_restore, The new code pre-creates the required number of workers (threads on Windows, processes on Unix) and farms out multiple pieces of work to them.  This is quite different from the earlier way pg_restore worked, which created workers on demand and allocated exactly one piece of work to them.

Congratulations to Joachim on this milestone..

Friday, March 15, 2013

Recent feature requests

In the last few days customers have wanted the following PostgreSQL features:
  • A time zone setting of "system" which will just pick up whatever the system's time zone setting is at server start. This would be useful for appliance  makers who pre-install a database and then ship the appliance to a customer who can change the time zone, as one of our customers does.
  • The ability to set a default tablespace for a schema, and to move all the objects in the schema to that tablespace. This was apparently something that got ripped out in the very late stages of the 8.0 development cycle, but it doesn't seem like a terribly bad idea at first glance.

Wednesday, March 13, 2013

Writeable Foreign Tables

There are lots of cool things coming in 9.3, but the one I'm licking my lips over is one that was committed the other day by Tom Lane: Writeable Foreign Tables by KaiGai Kohei. As soon as I get done with the work I have in hand for core 9.3 I will be an early adopter, adding this to the Redis FDW. This is going to be very cool.

Saturday, March 9, 2013

Loading Useful Modules in PLV8

PLV8 is a trusted language. In fact it arguably safer than pretty much any non-core loadable language, since the interpreter is naturally sandboxed. So there's no way to load external processing modules from the file system. However, pure JavaScript modules can be cleanly loaded from the database itself. A couple of extremely useful ones I have found are Underscore and node-jPath. The former can be loaded with no modification whatever. The latter is written for use with node.js and needs a small amount of adjustment before it's loaded in plv8.

So how do you do this? Well, first you need to grab the code:
curl -o underscore.js
curl -o jpath.js
Then edit jpath.js so that, at the bottom, instead of assigning to module.exports it creates an object called jpath and adds the select and filter methods to it.

Then you'll need a table to load the modules from, and to put the modules in this table. You'll also need a function to load them. These can be accomplished with the following psql script:
\set underscore `cat underscore-min.js`
\set jpath `cat jpath.js`

create table plv8_modules(modname text primary key, load_on_start boolean, code text);

insert into plv8_modules values ('underscore',true,:'underscore'),

create or replace function plv8_startup()
returns void
language plv8
load_module = function(modname)
    var rows = plv8.execute("SELECT code from plv8_modules " +
                            " where modname = $1", [modname]);
    for (var r = 0; r < rows.length; r++)
        var code = rows[r].code;
        eval("(function() { " + code + "})")();

Now test it out:
select plv8_startup();
do language plv8 ' load_module("underscore"); load_module("jpath"); ';
-- test the underscore module's extend function
do language plv8 $$ 
   x = {'a':1}; 
-- test jpath module's filter function
do language plv8 $$
  var jsonData = {
    people: [
        {name: "John", age:26, gender:"male"},
        {name: "Steve", age:24, gender:"male"},
        {name: "Susan", age:22, gender:"female"},
        {name: "Linda", age:30, gender:"female"},
        {name: "Adam", age:32, gender:"male"}
  //We want to get all males younger then 25
  var match = jpath.filter(jsonData, "people[gender=male && age < 25]");
You should see output like:
NOTICE:  {"a":2,"b":4,"c":5}
NOTICE:  [{"name":"Steve","age":24,"gender":"male"}]
Now, the final step is to make these get loaded automatically. Alter your plv8_startup() function by adding to following at the end:
// now load all the modules marked for loading on start
var rows = plv8.execute("SELECT modname, code from plv8_modules where load_on_start");
for (var r = 0; r < rows.length; r++)
 var code = rows[r].code;
 eval("(function() { " + code + "})")();
and finally add
plv8.start_proc = 'plv8_startup'
to you postgresql.conf file, and restart. You should now be able to use these modules anywhere in your PLv8 code.