Andrew Dunstan's PostgreSQL and Technical blog: January 2013

Thursday, January 31, 2013

Json object from an array of key value pairs

I mentioned this the other day, and I now have an extension with a simple function to turn an array of key value pairs into a json object, much as you can turn such a thing into an hstore.

The motivation for this came from a discussion on what to do with structured data returned by the Redis Foreign Data Wrapper. Originally we discussed making it json or an hstore, but on reflection it seemed best to me simply to return it as a standard SQL array and use the database utilities to transform it into whatever is needed. So this function works pretty much like the hstore function except that here it is the caller's responsibility to ensure that the keys are unique. When used with the Redis FDW that won't be a problem - they will always be unique.

Tuesday, January 29, 2013

Redis FDW update.

Today I've published a major revision to the Redis Foreign Data Wrapper code for use with PostgreSQL versions 9.2 and later. There are two new features: support for Redis structured data types and support for restricted keyspace search.

The structured data type support is enabled by using the table option "tabletype", which can take one of the values "hash", "list", "set" or "zset". If this option is used the data comes back as an array rather than as a simple scalar value. In the case of "hash" tables, the values are a sequence of key/value pairs. For the other types they are simply the elements of the relevant structure. If the values column of these tables is defined to be of type "text[]"(i.e. array of text) then an actual array is returned. If the column is a plain text column an array literal is returned. Hash table arrays in particular can be turned into records via hstore's "populate_record" functionality, or transformed in other useful ways.

Restricted keyspace search is enabled by using one of the table options "tablekeyprefix" or "tablekeyset". These are mutually exclusive. "tablekeyprefix" restricts the search to keys with the given prefix. However, in a very large Redis database this might still be expensive. In that case, it might be better to keep a list of the keys in a separate set, and this is supported using the "tablekeyset" option. When this is used the global keyspace isn't searched at all, and the list of keys is simply taken as the members of the set.

The new functionality is demonstrated in the test script (which is also new).

I am also working on some facilities to push data into Redis and similar things, but these fall outside the current possibility of a Foreign Data Wrapper, and will be announced separately.

Credit note: this work was supported by IVC

Monday, January 28, 2013

Array of Key Value pairs to Json

As I'm working on making the Redis Foreign Data Wrapper actually do something useful, it's occurred to me that one of the Json functions we're missing is the ability to turn key value arrays into Json, as we can with hstore. There's a bit of a mismatch in that Json's treatment of numbers and booleans involves distinguishing these from strings, but it would be possible to have a "use heuristics" option as I recently did with hstore to Json conversion.

When I get done with the Foreign Data Wrapper I will probably whip up a quick extension to add this functionality.

Tuesday, January 22, 2013

Handling Redis Hashes

IVC, a great boutique software house that I have worked with for quite a few years, has been shifting to a new, more modern, software stack, part of which is Redis. They have asked me to work on improving the interface between Redis and PostgreSQL. Last week they mentioned that they are doing a lot with Redis hashes, and asked me to improve access to them. These are basically sets of key/value pairs. In PostgreSQL terms think of hstore for an equivalent.

So I came up with a simple idea of mapping a foreign table to a Redis hash. This actually isn't too hard to implement, and I got it working more or less yesterday:

andrew=# \! echo type foo | redis-cli
hash
andrew=# \d xxx
       Foreign table "public.xxx"
 Column | Type | Modifiers | FDW Options 
--------+------+-----------+-------------
 key    | text |           | 
 value  | text |           | 
Server: localredis
FDW Options: (hashkey 'foo')

andrew=# select * from xxx;
   key   | value 
---------+-------
 bar     | baz
 blurflk | asdf
 qqq     | ttt
(3 rows)

So far so good. You can treat a Redis hash as a little table and select from it. And we could perform similar operations for the other Redis data structures (lists, sets and zsets). But I'm wondering if this is going to fit well with actual Redis practice. Let's say we're dealing with web session objects. In PostgreSQL you would probably have a table for these with a session id and an hstore or maybe a piece of json (especially after my json extractor functions get committed :-) ). But in Redis I think you're at least as likely to have one hash per session object rather than a hash of session objects, and Redis data structures don't nest, so there's no hash of hashes. So if you have thousands or millions of tiny session object hashes in Redis, with names like "web.session.a9f3c1763d" where the last part of the name is some fairly opaque session name, do you really want to be having to map each one of those to its own foreign table in order to be able to read the values nicely? I don't think so.

Following this thought I came up with the idea of a related FDW which would give you access to the whole of the hashspace in Redis. The tables would have three fields, not two, being, say, (hashname, key value). We could even impose a restriction on queries that would require you to specify a hashname, although that's possibly a bit ugly, and maybe we should just warn you against not supplying a hashkey query parameter.

So, which of these patterns is likely to be most useful? They are not actually mutually exclusive, and it's perfectly possible to implement both. But I would like to know what seems most useful to Redis users generally, especially if they are also using PostgreSQL.

Another possibility would be not to use the Foreign Data Wrapper interface at all, and just provide a set returning function which would take some connection parameters and a hash name and return the key value pairs. This too would be reasonably easy to do, as I already have (and will soon publish) a framework for generalised Redis access. On the other hand, one significant advantage of using Foreign Data Wrappers is that with FDWs handling pushed down quals ("where" conditions in effect) is rather easier to do, from what I can see.

Thursday, January 17, 2013

Mingw64 fail

Last week Gurjeet Singh asked me about getting the MinGW64 compiler working. When I looked I noticed that they has not made an official build for Mingw hosts (as opposed to linux hosts and darwin hosts) since late 2011. I went on IRC and asked why, and was told that they had switched to GYP to generate builds and it was broken on Mingw. Apparently nobody in the GYP project is interested in fixing it because Mingw isn't one of their supported platforms. Someone gave me this URL that describes the problem: http://code.google.com/p/v8/issues/detail?id=2197#c6

I was, as they say, gobsmacked. Nobody even really seemed to care that much.

If the build tool doesn't support one of your three main target platforms, and moreover it's the only target which isn't cross-compiling, STOP USING IT.

Tuesday, January 15, 2013

Inline is useful

As I was looking for ways yesterday to claw back some of the cost of switching to a recursive descent parser for json, I was interested to see I could get a small but quite observable improvement simply by declaring a few hotspot functions as "inline". This means that the compiler will expand calls to them rather than generating a function call, thus saving the overhead of setting up a stack frame and doing a procedure call. The benefits of doing this vary depending on the compiler and the architecture, but it can be substantial.

Not everything is a candidate for inlining - external functions can't be inlined because the call sites aren't available when they are compiled, and recursive functions can't either because it could cause infinite regress.

One downside is that it can make debugging harder, so it's often best to leave inlining to late in the development process.

Anyway, I'll probably pay rather more attention to it in future.

Monday, January 14, 2013

Json performance

Robert Haas wanted to know how much slower the recursive descent JSON parser would be than the explicit stack based one it's replacing. It's a reasonable question, and no doubt one close to his heart as he was the author of the original :-)

So we need a test, and I devised one by making a single large piece of json like this:

create temp table myjson as 
    select array_to_json(array_agg(q)) as jsontext 
    from (select * from pg_class) q;
\copy myjson to jsontext.csv csv

and a test script that looks like this:

create temp table myjson(j text);
\copy myjson from  jsontext.csv csv
\timing
do $$ declare lj text; begin select into lj j from myjson; 
  for i in 1 .. 10000 loop perform json_in(textout(lj)); 
  end loop; end; $$;

(In the actual test file the last line is repeated several times).

Robert had also pointed out that the parser was copying some strings in places where it didn't need to, and I quickly fixed that. But even after that change I was a bit dismayed to discover that there was a small but quite consistent performance degradation, So I went looking for things to tweak. It didn't take long to see that some things I had written as "if" statements repeatedly calling the routine that looks at the next look_ahead token could be much better written as a single call to that routine plus a "switch" statement.

The test script and data files are published at https://bitbucket.org/adunstan/pgdevel/downloads/jsontest.sql and https://bitbucket.org/adunstan/pgdevel/downloads/jsontext.csv The latest code with the above fixes can be pulled from https://bitbucket.org/adunstan/pgdevel.git, and you're welcome to play along. When this is all sorted out I will publish a new patch to the mailing list. Right now I can no longer get a consistently better result using release 9.2 than I get with my code. One of my colleagues is testing on a more stable platform than I have available. But I'm quite hopeful not that at worst the performance is fairly close.

Friday, January 11, 2013

Version 4.10 of the buildfarm client has been released.

Following GitHub's abandonment of their download feature, releases will now be published on the buildfarm server. The latest release will always be available at http://www.pgbuildfarm.org/downloads/latest-client.tgz This particular release is available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_10.tgz

The main feature of this release is that it does better logging of pg_upgrade failures.

The rest is minor bug fixes and very small enhancements.

Tuesday, January 8, 2013

Oh for decent recruiters in our industry.

The cluelessness of most IT recruiters never ceases to astound me. I received this today:

My name is (deleted) and I'm an IT recruiter at (deleted). Our records show that you are an experienced IT professional with experience relevant to one of my current contract openings.(deleted)
The job is located in (deleted) with one of our Fortune direct client. They are looking for a GUI Developer

Not only am I not qualified for this job, but I have proclaimed loudly my intention never again to do GUI work,

Plonk.

Monday, January 7, 2013

Fixing pgindent

It's sometimes amazing how hard finding a single misplaced or wrong character can be. I was puzzled and annoyed by some very odd behaviour of pgindent with the Json code I've been working on, and finally got a clue to what was going wrong when I found it indenting the whole source tree when given a single file argument. That led me to the source of the problem, and a fix which involved removing a single spurious backslash from the code. Since I wrote it I have no-one to blame but myself for the wasted 90 minutes. Those are the sorts of things that make you give yourself a mental kick in the pants.

Thursday, January 3, 2013

GitHub kicks an own goal

When I moved the Buildfarm code from pgFoundry to GitHub, I started distributing release tarballs from there too. But now GitHub has decided that they won't support that any longer. I suspect that they are being a bit disingenuous about their reasons, which probably have more to do with abuse than confusion.

I will probably move the releases to the buildfarm server - they are tiny and there is not a huge demand. But GitHub isn't the only kid on the block. They need to remember that. I know for sure I'm not the only person annoyed about it.