Two weeks ago, the VPS that hosts this site moved to a machine that had been patched for the Spectre vulnerabilities. Immediately, I began receiving warnings about high load, and these alerts continued unabated for over a week. I tried moving services to other hosts, and I reduced the resources allocated to
php-fpm, all to no avail.
As I continued to monitor and debug the situation,
fail2ban regularly appeared among the top resource consumers, but I didn’t think much of it;
fail2ban has always been a voracious resource user, but it’s an indispensable tool, so removing it wasn’t an option.
This past Thursday, I was running out of ideas and beginning to accept that Spectre would have an outsized impact when it occurred to me that
fail2ban is written in Python, and that perhaps the language version was contributing to my load issues. While I’m not a regular Python user, the home automation tool I use is also written in Python, and when I switched it to use Python 3, performance greatly improved. Since my
fail2ban install was also behind the latest release, I decided to re-install it using
python3 (3.4.2, for what it’s worth). Oh, how I wish I’d thought to do so earlier:
I've spent too much of the last week trying to restore one of my VPS to its pre-Spectre performance.
Turns out, I could've saved myself a lot of time by switching fail2ban from Python 2 to 3. That alone had a greater impact than relocating services.🤦♂️
— Erick Hitter (@ethitter) May 10, 2018
I’ve decided not to move any services back to my main VPS, but feel confident I could do so if I needed to. After a rather distracting two weeks, I’m ready to ignore my infrastructure for a bit.