Wednesday, June 18, 2014

Buildfarm Client version 4.13 released

I have released version 4.13 of the PostgreSQL Buildfarm client.

This can be downloaded from http://www.pgbuildfarm.org/downloads/releases/build-farm-4_13.tgz

Changes in this release (from the git log):
  • fcc182b Don't run TestCollateLinuxUTF8 on unsupported branches.
  • 273af50 Don't run FileTextArrayFDW tests if not wanted.
  • 9944a4a Don't remove the ccache directory on failure, unless configured to.
  • 73e4187 Make ccache and vpath builds play nicely together.
  • 9ff8c22 Work around path idiocy in msysGit.
  • 0948ac7 revert naming change for log files
  • ca68525 Exclude ecpg/tests from find_typedefs code.

If you are using ccache, please note that there are adjustments to the recommended use pattern. The sample config file no longer suggests that the ccache directory have the branch name at the end. It is now recommended that you use a single cache for all branches for a particular member. To do this remove "/$branch" from the relevant line in your config file, if you have it, and remove those directories in the cache root. Your first run on each branch will rebuild some or all of the cache. My unifoed cache on crake is now at 615 Mb, rather than the multiples of Gb it had been previously.

It is recommended that this release be deployed by all users fairly quickly because of the fix in log file names that was discarding some that were quite important.

Sunday, June 15, 2014

Sunday, June 8, 2014

buildfarm vs vpath vs ccache

I think we've got more or less to the bottom of the ccache mystery I wrote about recently. It turns out that the problem of close to 100% of cache misses occurs only when the buildfarm is doing a vpath build, and then only because the buildfarm script sets up a build directory that is different each run ("pgsql.$$"). There is actually no need for this. The locking code makes sure that we can't collide with ourselves, so a hardcoded name would do just as well. This was simple an easy choice I made, I suspect without much thought, 10 years ago or so, before the buildfarm even supported vpath builds.

It also turns out there is no great point in keeping a separate cache per branch. That was a bit of a thinko on my part.

So, in my lab machine ("crake") I have made these changes: the build directory is hard coded with a ".build" suffix rather than using the PID. And it keeps a single cache, not one per branch. After making these changes, warming the new cache, and zeroing the stats, I did fresh builds on each branch. Here's what the stats looked like (cache compression is turned on):
cache directory                     ccache
cache hit (direct)                  5988
cache hit (preprocessed)             132
cache miss                             0
called for link                     1007
called for preprocessing             316
compile failed                       185
preprocessor error                    69
bad compiler arguments                 6
autoconf compile/link                737
no input file                         25
files in cache                     12201
cache size                         179.8 Mbytes
max cache size                       1.0 Gbytes

So I will probably limit this cache to, say, 300MB or so. That will be a whole lot better than the gigabytes I was using previously.

As for the benefits: on HEAD "make -j 4" now runs in 13 seconds on crake, as opposed to 90 seconds or more previously.

If we have a unified cache, it makes sense to disable the removal of the cache in failure cases, which is what started me looking at all this. We will just need to be a bit vigilant about failures, as many years ago there was at least some suspicion of persistent failures due to ccache problems.

All this should be coming to a buildfarm release soon, after I have let this test setup run for a week or so.

Friday, June 6, 2014

ccache mysteries

ccache is a nifty little utility for speeding up compilations by caching results. It's something we have long had support for in the buildfarm.

Tom Lane pinged me a couple of days ago about why, when there's a build failure, we remove the ccache. The answer is that a long time ago (about 8 years), we had some failures that weren't completely explained but where suspicion arose that ccache was returning stale compilations when it shouldn't have been. I didn't have a smoking gun then, and I certainly don't have one now. Eight years ago we just used this rather elephant-gun approach and moved on.

But Now I've started looking at the whole use of ccache. And the thing I find most surprising is that the hit rate is so low. Here, for example, are the stats from my FreeBSD animal nightjar, after a week since a failure:

cache directory                     HEAD
cache hit (direct)                  2540
cache hit (preprocessed)              45
cache miss                         32781
called for link                     5571
called for preprocessing            1532
compile failed                       899
preprocessor error                   248
bad compiler arguments                31
autoconf compile/link               3990
no input file                        155
files in cache                     25114
cache size                         940.9 Mbytes
max cache size                       1.0 Gbytes
So I'm a bit puzzled. Most changes that trigger a build leave most of the files intact. Surely we should have a higher hit rate than 7.3%. If that's the best we can do It seems like there is little value in using ccache for the buildfarm. If it's not the best we can do I need to find out what I need to change to get that best. But so far nothing stands out.

Tom also complained that we keep a separate cache per branch. The original theory was that we would be trading disk space for a higher hit rate, but that seems less tenable now, with some hindsight.