Backup 2.0 DAS Edition

Tuesday, March 19th, 02024 at 13:31 UTC

Back in 02022, we wrote-up about our backup setup. Since then, we’ve recently shuffled a few things around to continue to improve.

If you haven’t read it, you should first read Backups.

3-2-1 Backup rule

The 3-2-1 method recommends that you have 3 copies of your data, on 2 different mediums with atleast 1 being remote/offsite.

3 Copies of Data simply means your working copy and (atleast) two copies elsewhere.

2 Different Mediums to increase redundancy. If all your hard drives are from the same company, manufactures at the same time, and the number of hours it is rated for is the same, that implies that when they do fail, they might all fail within a short window of time. Spreading that risk over different manufactures and different physical media (Tape, HDD, SSD, or others) minimize the risk of any catastrophic failure.

1 Remote Copy incase there is a fire, earthquake, flood or theft. Keeping one copy somewhere offsite prevents having all your eggs in one basket! Although, it does introduce security issues when your data are no longer stored directly under your control.

NAS vs. DAS

As our main backup, we have a simple 4TB portal, external hard drive (~$100) connected to the router doing daily backups of the computers. There is a second, ‘air gapped’ 4TB portable external hard drive which we plugin monthly and backup the computers. That means we have data in 3 places of various age: the computer, the networked daily backup and ‘air gapped’ monthly backups, on (at least two) difference mediums.

But we also have loads of photos, videos, old documents and emails which are not on the computers. These are mostly “cold-storage” archives which we access as needed.

As part of the 3 copies and 2 mediums recommendation, we ended-up with a plethora of portal hard drives. These are certainly a mis-match of different manufactures and age. We purchase new ones when we out grow the old. Like a hermit crab, we migrate data or split between two.

The data is mirrored between two external hard drives (see the stickers to keep us organized) and a 3rd copy elsewhere in the world. The downside is that every time we make a change to a file on one hard drive that means we need to do it in 3 places, which is a pain. This tend to either be for read-only archives or append-only and not for daily work.

To help get better organized and solve issues of mounting costs continuing to buying more and more external, portable hard drives, we looked into a NAS (Network Attached Storage) device. This would get plugged into the current network and act as a basic file server that also backups and replicates the data.

An entry-level 4 bay NAS* is around $500, plus 4x 4TB hard drives (~$80 each) is another $320. That’s a total of $820 to get an additional 12TB of storage in RAID5 configuration. To get a similar 12TB of mirrored storage using only 4TB external hard drives means you need to by 24TB worth (4TB * 3 drives * 2 to mirror the data * $100 = $600). You can quickly see how there is an inflection point where a hardware NAS box becomes cheaper and more manageable than 6 external drives. Be warned, NASes come with their own headaches too. They are mini-computers, which means you are becoming a network administrator and having to deal with ports, updates, configurations, etc. There is something to be said for the simple: plug-it-in and forget it.

* A proper NAS also can act as a media server, but we’re just looking for a simple, “dumb” file server.

A DAS (Direct Attached Storage) looks the same as a NAS, but there is no “computer”, it is just a box of hard drives that you need to plug-in to a computer. Recently, we picked-up a QNAP TR-004, a 4-bay DAS for ~$220 and put in 2x 4TB hard drives in RAID1 + $160 for a total cost of ~$380. Had we bought just 2 external portable 4TB hard drives that would have cost us only $200. We’re in a worse position financially, but the next $160 we spend on 2 more 4TB drives jumps use from a 4TB backup in RAID1 to a 12TB in RAID5. The total spend would be ~$540 for 4x 4TB HDD and the DAS. The equivalent 6x 4TB external drives would cost around $600 and a lot of cable juggling!

We’re not planning for today, we’re planning for tomorrow with the easy of upgrading and expanding.

Having our own DAS solves two big problems: the ability to grow as we need with four drives bays, and it is RAID, so no need to even think about making sure you plug in both drives and copy updates everywhere! We can use this as more of a working hard drive.

What this doesn’t solve (yet) is the 1 remote storage.


Of course, in one of those “it won’t happen to me moments”, when we started to migrate our 6-7 external hard drives to the new DAS system, the computer hosting the connection crashed. When it rebooted, neither the external DAS nor the portable hard drive would mount. We thought the data on the drives were corrupted and because the DAS is in RAID 1, both drives were instantly corrupted in the same way. Luckily, after poking around we took out the first drive breaking the RAID 1 and managed to get the remaining drive to mount. When we put the first drive back in, the system thought it was a new hard drive and started to rebuild it to sync it in RAID 1 with the remaining working drive. This was a good reminder that redundant drives in RAID protect you from hardware failure, not from data corruption. The 3rd copy (preferably remote) becomes much more important in situations like these.

Room for Improvement

The remote storage is the point that continues to be the most difficult. Services like AWS, Dropbox, Box, Backblaze and more incur monthly fees per device. With 3-4 devices at $7 a month, we’re at the $300+ a year. While that might not be a big deal, year-over-year it begins to add-up compared to the cost of running the backup hardware locally.

The benefit of the DAS is that if it is shared on the network via a desktop computer that is always connected, those external drives are considered part of the computer for Backblaze. Which means we could get the benefits of a NAS (which are really shared drive(s) on a shared locally networked computer) along with Time Machine backups and only pay for remotely backing-up one computer while actually backing-up everything in the office.

There are other options too. We also recently upgraded some of our networking hardware. It supports Virtual LANs (Local Area Networks) which means that if we had two of these routers at separate locations, we could continue to connect to remote servers as if they were local. We’ll continue to explore this option to see if we can get ‘local backups’ in a remote location. With disk encryption we could even partner with another small company and host their hard drive(s) and they host ours as “backup buddies”.

Conclusion

Clarus the Dogcow. Moof!

Remember, LOCKSS (Lots of Cows Copies Keep Stuff Safe), the more copies you have, the more likely the data will survive. Increasing the different storage mediums, locations and redundancy are key.

At a previous job, someone in the ops team once said “You don’t have a backup until you restore it”. That’s absolutely true! It’s the elephant in the room. Sure, we go through the steps to make sure we have our important data on redundant external hard disks and remotely saved, but until we actually try to restore from a backup we aren’t 100% certain it will work.

🐈‍⬛💾📦 Schrödinger’s backup box: until you restore, you simultaneously have a backup and don’t.