At the moment, generic-worker supports cache files & directories, with them optionally being pulled from Taskcluster artefacts.
However, for our client builds, we have large amounts of cache (the Unity project’s Library directory for example, which can count tens of gigabytes). I don’t think it makes that much sense for us to upload it as an artefact (compression + uploading + downloading would cut into the benefits quite a way).
On cloud providers though, generally we can attach multiple disks and change the attachments whilst the machine is running. This is the case for at least GCE.
Which means, that we could potentially make it so that cache directories in generic-worker are implemented as GCE persistent disks (for example) instead of local directories in this situation. This would increase the chance of cache hits, especially when the build machines themselves are short lived.
Roughly, I guess this could be implemented like so:
- Use labels in the cloud provider to mark & locate cache disk volumes.
- Add APIs to worker-manager for attaching cache volumes to workers. (This mitigates the need for workers to have credentials for dealing with the cloud provider.)
- Add support in generic-worker to use this API (and then potentially prepare/ mount the given volume).
- Use bind mounts for cache directories (or junctions on Windows) to minimise copying of the cache directory.
It could have some interesting caveats (for example, copying/moving out of cache directories would be a fair bit slower since they’d be going over an FS boundary).
This was mostly fallout from a discussion I had with @pmoore but I’d be very interested to see if anyone else had any thoughts. (Some of this was a follow-on from other bits and pieces of conversations I’ve had about having longer-lived caches.)
If anyone has any other thoughts, I’d love to hear them!