Along with the joy and burden of running my own servers comes a great deal of paranoia. Are my machines secured against unauthorized access? Is my mailserver an open relay? Will DNS for ethitter.com keep working if my primary machine is down? What happens if something crashes? Do I have all of my configurations tracked should I need to rebuild one of the boxes?
These, and many similar questions, are so frequently thoughts of mine that I had no choice but to establish many layers of redundancy and backups, lest I be unable to focus on anything else.
My paranoia essentially presents two considerations: having another server to handle certain requests when the primary goes down, and protecting against lost data should a server crash in an unrecoverable way.
As I wrote about last weekend, my primary machine is hosted at Linode. That box runs this WordPress site, my mailserver, and the nameservers for ethitter.com and a few dozen other domains. Under normal conditions, the overwhelming majority of requests are handled by the Linode VPS.
To continue receiving mail and delivering DNS records when my Linode machine isn’t operational, I two additional machines with Digital Ocean–one in Europe and one in Asia. Their configurations are identical, and both are very specific to their purpose: they run only Postfix and the nsd nameserver dameon. In other words, the redundancy I’ve introduced is focused on the essentials only. This website and the other services I host with Linode aren’t as critical as email and DNS; to me, the effort involved with mirroring these other services simply isn’t worth it. Were Linode less stable, I might have made a different decision.
Setting redundancy concerns aside, when it comes to backups, I’m testing the adage that one can’t be too careful.
To start, Linode offers an automated backup service, which I subscribe to. This provides daily and weekly backups of the whole server, plus I can take snapshots at will. The snapshots option is particularly nice if I’m upgrading something or testing some other major change, as I can quickly roll back to the last working state; I’ve fortunately only had to do so once.
Similarly, Digital Ocean offers a backup service, though it’s considerably less comprehensive–it only takes weekly backups. Still, I’ve enabled it just to be safe. A snapshot feature like Linode offers would be one small way to overcome the infrequency of backups, but I digress.
Lastly, there is a fourth server in my network whose sole purpose is to perform backups of the other three VPS. Using BackupPC, this machine takes hourly incremental snapshots, and makes a full backup every three days. Hourly snapshots are retained long enough to ensure that one exists for every hour since the last full backup. Those 72 hourly incrementals, plus the 60 full backups that are retained for each server, are possible because the provider offers a “storage VPS” with fewer RAM and CPU resources, but massive amounts of disk space: for my particular needs, I’ve chosen their tier that provides 750GB of RAID 50 storage.
Managing four servers means dealing with lots of different configuration files. Tracking their history becomes necessary for many reasons: being able to undo a breaking change, and restoring a setting lost during upgrade, are just two examples.
Many tools exist specifically for the purpose of maintaining server configurations. Chef and Puppet come to mind; Amazon’s AWS provides CloudFormation and Elastic Beanstalk. Automattic’s home-grown solution, Servermattic, is even open-source. Despite all of these purpose-built solutions, I took a far-more basic approach: git.
I’ve initialized a git repository in the root of each server, to which I add a program’s default configuration file(s) before I modify anything. With this approach, I can include links to tutorials and other resources in each commit message; if I used a guide from Digital Ocean, for example, I can note that for future reference.
To centrally store these configurations, my Linode machine runs an instance of GitLab, an open-source GitHub clone (the GL instance is used for other things, too). Each server’s git repo has a remote on my GitLab instance, which is in turn backed up as part of the Linode box. Additionally, remotes are synchronized on GitHub and Bitbucket so that if I break GitLab during an upgrade (it’s happened), all is not lost. While I am a bit wary of my configurations being stored with these hosted services, two-step authentication and other security measures give me a reasonable level of confidence, as does the generally non-sensitive nature of the configuration files themselves.
Yet More Backups
At this point, I have multiple, regular backups of each servers’ contents, plus several copies of their configurations as tracked in git repositories distributed across providers. If all is functioning as designed, I should have multiple backups and little chance of data loss. But again, I’m paranoid, which warrants application-specific backups too.
After email, WordPress content is the most-important data I’ve to be concerned about protecting. Content dates back to at least 2009, and includes several-dozen gigabytes of images and other uploads that I’d struggle to reproduce. While all of this content is part of the full-server backups, I wanted to be sure that anything authored between scheduled backups was also protected. Enter the VaultPress plugin from my employer Automattic. It detects changes in the WordPress database, uploads directory, and file structure, and backs those differences up in real-time. As an added bonus, it also provides security scanning. With this, I’ve now a third backup of my WordPress content (after Linode’s full backups, plus those I’m making of that machine).
Most of the applications running on my network rely on a MySQL database and in some ways, the database server’s contents are therefore more important than what’s contained in static files on the machine. Granted, the binary files that MySQL holds its data in are part of my myriad server backups, but an improper server shutdown is just one event that could corrupt those files, making their recovery difficult or impossible.
To satisfy my concern over corrupted MySQL storage, I employ the Automysqlbackup script. It makes daily, weekly, and monthly SQL dumps, which are automatically uploaded to Amazon S3 using the s3cmd utility. Since they’re present on the server, the aforementioned backup methods also capture these MySQL exports, so they’re represented in three places.
GitLab’s upgrade routine has long included a step that generates a complete backup before the upgrade proceeds. Starting in a recent version, however, an option was added to upload those backups to Amazon S3 rather than simply keeping them on the file system. As a result, my GitLab instance (including all of my server configurations) is also backed up at least three times: on S3, in Linode’s backups, and on my backup host.
At what point have I implemented too many backups? I’m honestly not sure yet, but I must be getting close.
I did stop myself from adding a second backup server, to back up the primary backup server. That could still come, though–or I might put s3cmd on the backup host to keep it synched up to S3.