On setting up an automated cloud backup solution
The Problem
In an effort to
control everything in my life
subdue my fear of losing the things I love
dampen my anxiety that things are changing
- be a more secure and responsible person
I started looking for ways to remotely backup all the junk I collect across my devices.
Current I maintain system snapshots locally using btrfs
, but I do not currently have any form of cloud backup of my snapshots.
I also do not have any backup of any of my locally stored data.
In addition my data is stored in RAID 0 distributed between a dozen HDDs and SSDs (both internal and external).
As I said, secure and responsible 😬
Requirements - Backup Solution
- Local encryption
- I need any and all cloud uploads to happen only after local encryption.
- Compression
- Cloud storage is expensive, and I'd like to spend more money on speciality coffee than on cloud solutions.
- The backup solution needs multiple compression algo support based on size reduction %.
- Deduplication/Incremental backups
- I do not want to make full fledged backups each time.
- I'd like to make incremental backups that build on previous backups.
- It'd be a plus if the backup uses a Merkle Tree like structure to ensure the order and veracity of the backups.
- Cloud storage support
- I'll likely rely on a niche cloud storage solution (for reasons discussed below).
- The backup solution will need to support whatever cloud solution I end up with.
- Nifty CLI support
- I'm likely to schedule the backup solution as a
anacron
job for automated backups.
- The backup solution CLI should be handy enough that I should be able to do this without much custom scripting.
- Supports multiple clients
- I'll be backing up from both my laptop and desktop.
Requirements - Cloud storage solution
My requirements boiled down to
- Cheap (ideally free for <10 GiB of storage).
- Does not have bandwidth costs for backup restores.
- Reliable history in terms of data handling (as mentioned, despite encryption, I'd prefer my data be in the hands of someone I atleast marginally trust)
Why not use Google Drive/OneDrive/Dropbox etc.?
There's a few reasons for this -
- Drive solutions focus more on online/app data access, and not so much bulk data backup/restores.
- Most drive solutions have bandwidth limitations with larger downloads being throttled unless streamed/fetched through a proprietary app (for ex. videos downloaded/streamed from Google Drive tend to be throttled, but videos streamed from Google Photos do not).
But most of all, I have an inherent distrust of the companies behind these solutions.
Backblaze
The cloud solution I settled on was Backblaze.
- Founders are privacy focused.
- Beyond legal notices/disclosures there have been no accounts of file parsing.
- There have been no notices of file specific data being shared with third parties (even as aggregate data).
- Account specific data is still collected.
- Fairly reliable all things considered.
- Have multiple backup restoration options including having a physical drive mailed to you.
- Has amazing support by all accounts.
- They're very active and responsive on r/backblaze for example.
- Is cheap
- Free for first 10 GB.
- $5/TB/month after that.
Backup Solutions
There were a few options I discarded out the gate for not having the expected de-duplication/encryption/compression support.
Of the ones I considered, these are the ones that stood out
- Borg
- Pros
- Encryption/Compression/De-duplication support.
- Has a handy GUI - Vorta
- Is hailed as the holy grail of backups and comes with stellar recommendations.
- Seems to have the fastest de-duplication algorithm.
- Cons
- It does not have a very good way to back up to custom cloud solutions.
- I'd have to rely on creating a local backup which I
rsync
to a cloud service.
- Duplicity
- Pros
- Encryption/Compression/De-duplication support.
- Decent CLI support.
- Has much better repository support compared to Borg.
- Supports virtually every cloud backup mechanism under the sun.
- Cons
- Is named Duplicity which is an inherently untrustworthy name (?)
- Weak de-duplication
- A chain of dependent backups starts with a full backup followed by a number of incremental ones, and ends when another full backup is uploaded.
- Deleting one backup will render useless all the subsequent backups on the same chain.
- Periodic full backups are required, in order to make previous backups disposable.
- Duplicacy
- Pros
- Encryption/Compression/De-duplication support.
- Supports multiple cloud repositories.
- GUI support.
- Lock free concurrent access.
- Cons
- It runs as a local server which I severely dislike.
- Not free as in beer.
- It has a free trial which expires on 30 days.
- Concurrency assumes a single global backup.
- De-duplication happens across client backups.
- i.e. Different clients do not have a different backups if the file hashes match.
- Restic
- Pros
- Encryption/Compression/De-duplication support.
- Supports multiple cloud repositories.
- Has a handy CLI that I can alias as needed.
- Assumes backups from different clients need to be separate.
- This is ideally what I want.
- Since hashing is client specific in Restic, de-duplication does not happen across clients.
- Cons
- Concurrency
- It sets up file locks that prevent concurrent access from different clients.
- This means you cannot have concurrent access to the same repository from 2 clients.
- Cloud solutions do not support locking (usually) so it does the hacky package manager solution, i.e it relies on the existence of a lock file.
A rough comparison of backup solutions -
I ended up choosing Restic since it -
- Supports Backblaze B2.
- Supports encryption, compression, de-duplication, incremental backups and easy snapshot pruning.
- I prefer Restic's method of client isolation for concurrent access, over Duplicacy's method of global de-duplication.
- Has an easy to use CLI.
Really the only issue I have with it is that concurrent access from multiple clients is prohibited.
This means if my anacron
job triggers on both my laptop and desktop at the same time, then one of them will fail.
Putting it together
- I have a Restic repository set up on Backblaze B2.
- I have an
anacron
job that runs daily (assuming it hasn't run in atleast 24 hours).
- Job success/Failure gets emailed to me and is auto sorted into a specific folder.
I finally have a working cloud backup solution! 🎉
References
This page is more stream of consciousness and less blog.
Reach out to my email for
any discussions or offers.
Last updated -
Tue May 07 2024