Monitoring and culling stale GitLab Runner instances

After I set up GitLab’s continuous integration using DigitalOcean Droplets for autoscaling, I noticed that sometimes, a runner would fail, abandoning a Droplet that I’d manually need to destroy. While this problem was infrequent, it was troublesome-enough to warrant some automated solution. Not finding anything readily available, and thanks to DigitalOcean’s godo library, I put together a Go program to periodically cull stale Droplets.

Very-creatively titled GitLab Runner DO Monitor, the app retrieves a list of Droplets via the DigitalOcean API and deletes any created more than a certain time ago. The staleness threshold, as well as whether to report or report and delete Droplets, are the only options provided. The program is designed to be called via cron, with reporting happening via a log file.

The current version doesn’t remove the stale machines from docker-machine‘s registry, but that’s a planned feature. After all, my primary concern was related to billing for zombie Droplets.

The code is available at https://git.ethitter.com/debian/gitlab-runner-do-monitor, with binaries available for download from https://git.ethitter.com/debian/gitlab-runner-do-monitor/tags/v0.1.0.

Aside: I created Ansible roles both for the runner monitor, and for my log-alerting program, eth-log-alerting. Both are available at https://git.ethitter.com/ansible/, along with my role for GitLab Runner.