Open-source as my path to data ownership

I find myself in an interesting position these days. One the one hand, there are excellent, often free, services to provide nearly anything I need in my digital life. On the other hand, being at the mercy of a business’s whims and Terms of Service is disconcerting. A few years ago, despite the excellent quality of many hosted services, I began transitioning to open-source solutions that I can host myself.

My interest in open-source alternatives is, unsurprisingly, related to my career and current employer. WordPress was my introduction to the concept of open-source software, and the flexibility I’ve had in using it to run this website certainly endeared me to its ideals. It is from this experience that I’ve resolved to find my own solutions to the web services that I use most often.

Two that I’ve written about previously, so won’t cover here, are my mailserver and DNS. I transitioned those away from Gmail and Amazon’s Route 53, respectively. Before those moves, though, I first supplanted GitHub, Google Analytics, and others.

That I replaced any of the services discussed below shouldn’t itself be interpreted as a judgement against those providers–my decisions were driven purely by a desire for control and data ownership.

GitLab

GitHub is an amazing service, and it’s had a transformative impact on how software is developed. That said, it’s also a paid service, and its use is subject to their terms and conditions. Its availability is dependent on the company continuing to exist, and continuing to offer hosted version control. While I have no expectation that GitHub will disappear tomorrow, I simultaneously don’t want to risk a single service’s misfortune eliminating the myriad projects I host with them. Still, as noble as the preceding may be, ideology wasn’t the initial impetus.

Instead, I first looked to replace GitHub as I tired of paying yet more each month for a growing number of private repositories. I briefly considered competitors to GitHub, but the options were limited and few addressed my concern over cost; none would have resolved the risk of relying on a third-party. It was in this context that I discovered GitLab.

GitLab is a fully-functional clone of GitHub, which also incorporates the continuous-integration abilities of Travis CI and the like. Though delivered in its own style, it provides the same capabilities as the GitHub web interface. GitLab also offers a GitHub integration, to both ease migration and simplify working with codebases that have remotes on both providers.

Pydio

Replacing Google Drive presented an interesting challenge. First, I had to decide what exactly I was replacing: remote storage, collaborative editing, document sharing, etc.

This consideration proved more difficult than selecting an alternative, honestly. I initially thought that I needed a total replacement for Google Drive, perhaps because I was thinking that I wouldn’t commit if it wasn’t a complete solution. I then realized that of the myriad things that Google Drive does, I utilize relatively few. My primary need was for remote storage and synchronization of files. While I do use its document editing features, I do so rarely–so rarely that it wasn’t a requirement.

With my focus on remote file storage, I identified both ownCloud and Pydio as potential solutions. I was initially drawn to ownCloud because it could’ve exceeded my needs by replacing all of what Google Drive offers (as well as Gmail, Google Calendar, and Google Contacts). After a few months of using it, however, I realized that it was excessive and I needed a focused solution–Pydio turned out to be the answer.

Unlike ownCloud, Pydio only provides remote storage, and the project benefits from that singular focus. The web interface is excellent, and applications exist for all major platforms, paralleling what Google Drive offers. Pydio also provides a key feature that Google Drive has thus far overlooked: workspaces.

In Google Drive, a single folder on a computer corresponds with the contents of a Drive account. In other words, Drive doesn’t allow me to pick several different local folders to synchronize, instead requiring that I move to its directory anything I want to upload. Pydio doesn’t suffer from this limitation. Purpose-specific workspaces can be added, and Pydio clients configured to synchronize specific directories with those workspaces. This capability is crucial to my transition away from Google Drive, as I’ve configured a workspace to mirror my files there, ensuring that Pydio contains a replica of my Drive’s contents. I’ve also added workspaces for other directories that I commonly leave important files in, like the downloads and desktop locations. Perhaps most notably, this workspace approach means that I don’t need to consciously copy files between Pydio and Google Drive.

Piwik

When I learned about the Piwik project in 2012, my desire to own my data was quite nascent still, but the possibility of gathering visitor data independent of Google Analytics intrigued me. With relatively little effort, I’ve run Piwik alongside its inspiration and Jetpack Stats since January 2013, providing me ongoing access to more data than either other services allows. As neat as this information is, it’s largely useless, but I like that I have if it I want to review it.

YOURLS

I hesitated to include YOURLS, until I appreciated how frequently I use it. YOURLS is a URL shortener, a type of service that was born in response to Twitter’s 140-character limit and the general need to shrink URLs to a human-manageable size. bit.ly and similar services are largely unnecessary nowadays, as Twitter and others found better ways to deal with long URLs, but I still prefer the option of a short URL, with the flexibility and analytics it can provide.

Instagram

Instagram earned a spot on my list following their Terms of Service debacle in 2012 (details). Even though they reversed the changes in response to widespread backlash, I’d already moved my images to my new photoblog and found no reason to resume posting to the service. Four years on, I’m glad I made this decision–Instagram is now so overrun with spammers that I made my account private to alleviate their harassment.

In place of Instagram, I publish my images using, no surprise, WordPress! Besides being unimpeded by Facebook’s Terms of Service whims, self-hosting my photoblog gives me incredible freedom in how I organize and present my images (and they remain my images for perpetuity). Case in point, I added custom taxonomies to better organize where photos were taken: https://i.ethitter.com/protected-area/national-forest/los-padres-national-forest/ and https://i.ethitter.com/location/portugal/.

BackupPC

Owing to this last solution, I’m comfortable storing important data on my own servers, despite the potential for loss that doing so poses. As I noted in my post “Assuaging my paranoia with redundancy and many, many backups,” my network includes a fourth VPS solely for backups. Running BackupPC, that VPS makes hourly snapshots of my other servers, such as those that host GitLab, Pydio, and my email.

By synchronizing all important files via Pydio, and ensuring that my Pydio host is backed up by BackupPC, neither my laptop nor Google Drive represent a single potential loss of data.

What’s Next?

Now that I’m hosting my own solutions for DNS, email, backups, document storage, version control, and even publishing and website analytics, the inevitable question may be, “What will I replace next?”

Never without an idea to experiment with, I’m currently exploring alternatives to Pocket and Instapaper. Wallabag is promising.

Leave a Reply

Your email address will not be published. Required fields are marked *