Two weeks ago, the VPS that hosts this site moved to a machine that had been patched for the Spectre vulnerabilities. Immediately, I began receiving warnings about high load, and these alerts continued unabated for over a week. I tried moving services to other hosts, and I reduced the resources allocated to nginx
and php-fpm
, all to no avail.
As I continued to monitor and debug the situation, fail2ban
regularly appeared among the top resource consumers, but I didn’t think much of it; fail2ban
has always been a voracious resource user, but it’s an indispensable tool, so removing it wasn’t an option.
This past Thursday, I was running out of ideas and beginning to accept that Spectre would have an outsized impact when it occurred to me that fail2ban
is written in Python, and that perhaps the language version was contributing to my load issues. While I’m not a regular Python user, the home automation tool I use is also written in Python, and when I switched it to use Python 3, performance greatly improved. Since my fail2ban
install was also behind the latest release, I decided to re-install it using python3
(3.4.2, for what it’s worth). Oh, how I wish I’d thought to do so earlier:
I've spent too much of the last week trying to restore one of my VPS to its pre-Spectre performance.
— Erick Hitter (@ethitter) May 10, 2018
Turns out, I could've saved myself a lot of time by switching fail2ban from Python 2 to 3. That alone had a greater impact than relocating services.🤦♂️
I’ve decided not to move any services back to my main VPS, but feel confident I could do so if I needed to. After a rather distracting two weeks, I’m ready to ignore my infrastructure for a bit.