Backups

Tuesday, May 31st, 02022 at 13:31 UTC

Everyone knows they should have a backup! 

Most of us do have some sort of passive backups. If you have an iPhone, you are probably using iCloud to sync contacts, calendars, etc. You might even backup your phone to the cloud so you can easily get started on your next device, or restore if the current one is lost or stolen. Then we have tools like Dropbox, Box, OneDrive, etc. These are making your local files also available in the “cloud” (someone else’s computer).

But what about everything else?

Here is our current setup. We have plenty of room to improve and we’re always making tweaks as needed, but maybe it will give you a few ideas and/or reminders of things in need of backup that you forgot about!

We have computers, devices, services, remote and more that we looking after.

Devices

These are mobile devices such as phone, tablets and fitness trackers. Other devices don’t have much in the way of an operating system or data that requires backing-up and/or they are synced with another device already or cloud-only (smart speakers).

To keep our devices backed-up we use the iCloud backup to periodically keep the data remotely backed-up. If/when they get plugged into the computer, they are also backed-up locally. Sadly, iPhones/iPads can only restore the same device from a local backup, whereas a new device can be restored from a cloud-based backup.

Computers

The Mac computers run backups via TimeMachine. Every 30-60 minutes, changed files are versioned and backed-up to a hard drive on the network. If a computer were to crash, we have a quick local backup we could restore from. That goes for a full-restore or individual files.

Every once and a while, the power might go out or the router fail and any in progress backup might be corrupted. This isn’t hugely problematic, we just have to start from scratch, but for those few hours, we are backup-less! So we bought a second “air gapped” hard drive that gets plugged into the computer roughly once a month and a TimeMachine backup is performed. This has two advantages, it gives us a second backup, but since it is also not on the network, it (hopefully) won’t get us into a malware/ransomware backup loop. If the backups are very frequent, then our backup might also be infected. The slower, air-gapped hard drive (hopefully) mitigates this.

The computers are also using Dropbox to share with external clients. These files are stored both locally an off-side. The same goes for any code in a git repository.

We looked into using rsync to do the same as TimeMachine and only backup files that have actually changed, but we haven’t found a cloud-based, cost-effective solution (yet).

We have backups and a system in place, but the majority of the data is still saved locally. Getting our data off-site is one big way to improve.

Services

We use online services like Twitter, Trello and others. How often do you backup your data and where do you store it when it is backed-up? Many of these tools have nice export features thanks to the EU data protection laws. Periodically we backup our twitter data, but not much else. This gets put into a ‘backups’ folder in Dropbox and is now safely in the cloud and part of the regular backups.

We could certainly improve by making a better list of all of the online services we use regularly that have company data that would be annoying to retype or remember if needed.

We tend not to worry about these services too much, because we assume THEY have a good back-up plan in place.

In the past, we had instances were the company we were working for went bankrupt and both locked us out of these services before we thought to export our data and they downgraded their service package and older records were lost (unless they re-upped their subscription).

If we had regular backups, then what we lost access too would be minimal. (This was needed to prove to the bankruptcy lawyers we were actually contractors. Thankfully, they were understanding and meeting invites to our non-corporate email and the word from the owners, was enough, but it could have gotten messy without much “proof”)

We do wish we still had access or made copies of some of the presentations we made on their behalf!

Remote

This is the area that is both fully automated and least remembered. For many of the company projects we use Heroku hosting. They have both daily database backups and rolling database snapshots. Those are saved in Amazon S3, which we trust won’t magically disappear (for now). All the code that is on Heroku is uploaded via git version control, so we are fully confident that the code and data are backed-up.

But heroku isn’t our only hosting. We have simple VPS hosting for this website and others. That data is backed-up far less regularly than it should be. We have copies of the files locally, but WordPress installs other templates and configs, file uploads, etc. only exist on the server. Now, we are assuming the hosting provider is doing some backups, but we’ve been wrong about that in the past (TextDrive!).

One solution for us, was to write a small shell script to cURL our domain. That flattened every page and put it into version control. You can see the archive script in GitHub and fork your own. If something goes horribly wrong, we always have the flattened HTML files which are more technology/version agnostic and we can get the raw text out with minimal effort.

Archives

We also have four (2 sets of 2 identical) external hard drives with archive files on them. One set includes videos, slides, pictures, etc. from Material Conference. The other drive contains old work projects, movies, tv shows, and more. Things we don’t use on a daily basis.

These hard drives are backed-up in duplicate manually. This is another point of improvement, we could use rsync or another tool to make sure they are truly identical and we didn’t missing anything.

We can certainly improve because while they are backed-up twice incase of hardware failure, they are still within the same physical premises. They are subject to fire and earthquake damage. To be absolutely safe, one should be off-site! (And much of the Material Conference data is also in NextCloud somewhere in Germany)

Email IMAP versus POP3

All of our email is setup as IMAP. Most people are probably setup in a similar way. This allows us to check the same mailbox on the web, our devices, desktops and elsewhere without making those messages unavailable. But do you have a backup of your mailbox? Locally, your app will have retrieved some of the messages, maybe just the sender and subject info? The majority of emails probably exist only on the server.

Having an email backup is something you’d expect your service provider to have in place and they probably do, but best to always do something yourself (if possible).

For instance, there are situations where hosted backups wouldn’t help! When one of our clients went bankrupt all of our email correspondence using the corporate email address disappeared when we got locked out. Had those emails been POP3 and not just webmail, we would have had a copy for the lawyers.

Conclusion

At a previous job, someone in the ops team once said “You don’t have a backup until you restore it”. And that’s absolutely true! That’s the elephant in the room. Sure, we go through the steps to make sure we have our important data on an external hard disk or remotely saved, but until we actually try to restore from that backup we aren’t 100% certain it will work.

Schrödinger’s backup box: until you restore, it is both simultaneously a working backup and not.