portrait picture

TIMO ZIMMERMANN

balancing software engineering & infosec

Artist vs Backup

posted on March 31, 2021, 5:32 p.m. in life, hardware, business, soho

Over the last year or so I have seen my wife get a little bit angry at her Twitter timeline every other week or so. Specifically she was upset about people in the artist community losing or nearly losing data when their system died, drives got corrupted, laptops stolen or a good old PEBCAK. Backing up data well is still not something many people do, apparently. Some do not know how, some cannot afford proposed solutions and others are not aware that computers are garbage that will blow up when you need them the most, destroying all your work. So, let us fix this.

This advice is generally applicable for everybody who wants to safely back up their data; the biggest difference when talking about artists is that the data size to backup is significantly bigger than what you usually find when someone is only doing their taxes on the computer. Data size might vary from a few gigabytes to terabytes if someone is doing some large scale 3D modeling. Solutions to backup data are still mostly the same, except at extremely large scales (which we will discuss separately).

When backing up data, there are a few things to keep in mind. Generally, follow the rule of threes: Have three copies of your data. One on the system you are working on or a drive attached to it. One with an online storage provider. One on another system or preferably storage not connected to your system.

The one on or connected to your system is obvious: You want to access your files easily. An online provider has very high durability guarantees when it comes to keeping your data healthy, but might charge you for retrieval – this should be your last resort to recover your files in case of an emergency. The place for your third copy will also be important, but more on that later.

If you use an external hard drive, you want to make sure it is not permanently connected to your system. If you catch some nasty ransomware encrypting all your files, have some electric trouble or are drunk and format everything because you regret becoming an artist and think it is a great idea to become a gardener, a permanently connected drive might be compromised as well. You should also keep in mind that hard drives die, even when not powered on. Data is fragile, and so are hard drives. The same is true for SSDs or any other form of storage medium like optical discs (aka CDRoms or BluRays).

You might also want to use multiple tools to backup your data. What if one of them has a bug and compromises files under certain conditions? Software is like anything else related to a computer: Unreliable, full of bugs and generally you should only trust it as much as a drunken friend butt-dialing you at 3am telling your about a business opportunity.

Properties of a backup

Before we dive into the "how" let us talk about the "what" a little bit. There are many misconceptions what backups actually are and as a result of that the choice of backup solution is often problematic. We will keep this as short as possible and focus on the most important aspects.

Per definition, a backup is "a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event". What properties does a backup need to have to be a good one though?

First up, a backup should be durable. A storage provider guaranteeing a durability of 100% would mean files are never lost (this is a lie, do not trust anyone who sells a 100% solution). 90% would mean 1 in 10 files might be lost. Storage providers often calculate in 99.xxx% where each x is a 9 up to 9 or 10 places behind the comma. This means you might lose one file in a million or more years. Many services allowing you to upload files do not give you any guarantee on durability, because that's not their focus of business, and keeping files healthy is costly. As a result, files might be lost at any point in time.

Besides durability, a backup needs to be available. If you need a file and the provider you stored your data with is down for multiple days, you might not have a good time. Uptime guarantees are, as many other guarantees, a bit of a gamble. There are usually some clauses in the terms of service which allow providers to fail their goal without consequences. But having an uptime guarantee your provider at least tries to reach is way better than a free or new service which is shocked when 20 people sign in at the same time and all their servers are on fire.

Long term viability is something to keep in mind as well. Free services are always in danger of running out of money, being acquired or cutting down on their free tier. You want your backups durable and available; this is a long term game. You might need them in a few years. If the provider does not exist anymore, well, you suddenly do not have any. You want to be a paying customer. And you want to go with a provider which actually has something that deserves to be called a business model. It does not mean that none of the above can happen, but it reduces the chances greatly.

The last thing to keep in mind is that automatic synchronisation does not mean you have a backup. If you accidentally delete a file locally and the delete is synced to a remote system, your file is still gone. Some providers keep deleted files for 30 days. Providers designed for backup or storage purposes allow you to extend this by a certain time or store data forever. Version histories (and not sync) is especially interesting considering ransomware encrypting your files and asking for a small donation is on the rise.

I was asked to be very specific here: Just because you can upload data to a random service does not mean you have a backup. Discord, for example, does not meet all of the above criteria and is not a safe solution for durable backups. (This message was brought to you by my wife.)

Software

Let us start with software. Microsoft and Apple figured out that backing up data is important and people are not doing it because it is too troublesome. So they set out to include software to do exactly this in their operating system and make it as easy as possible. Microsoft ships File History and Apple Time Machine. You can backup your whole computer or only certain files and you can treat this as your first copy: An always connected drive constantly backing up data. If you accidentally delete a file, you can quickly recover it. Those two integrated solutions are pretty decent for continuous backups, but fall a bit behind when you want to create your second local copy.

Your second copy can be done as easy as attaching a USB drive every Sunday and manually copying all your files over. This sounds pretty low tech, but it gets the job done. (Automating this through software will be a lot more comfortable.) For long term archival you can simply rotate disks, you can create two duplicates if you fear them dying on you or, if you want to be really fancy and safe, you can start using some form of network attached storage (short: NAS).

For your third backup you can, again, use the manual process to upload your data to a storage provider. Or leverage software from a company specialised in selling online backup storage like BackBlaze or Crashplan. Please make sure to check for extended or forever version history if you go with a provider like BackBlaze. Or you can use the same software that can easily take care of local backups, Arq. Arq is vendor agnostic, well established and easy to set up.

Hardware

For hardware you can basically leverage whatever can hold enough data. If you are looking for cheap, external hard drives you can basically go to any online shop or local store and buy something with an USB interface. Western Digital is a decent choice and 18TB for less than $400 is a good deal. You can store a lot of data on there.

Go for 3.5" hard disks, they are a lot cheaper than SSDs, but they are also slower. For most backup scenarios the actual speed will not matter a lot, but the price difference is significant. If you want to get slightly better disks you can go for server class disks like Seagate IronWolf or the Seagate IronWolf Pro which even offer free data recovery for up to two years if your disk dies. You will have to get an separate enclosure to put the disk in. (Those disks are designed for NAS usage, so they deal with vibrations a bit better when lots of them are in close proximity. More on that in a second.)

There are lots of discussions online which vendor has the best disks in which class and series. Do not try to figure this part out. Most of them will do well, some will die, some will live what feels forever. It is a gamble that starts to matter at scale, but not when you get a single one or even five. No matter if you pick a spinning hard drive, SSDs or NVMes. No matter if they are connected to a system or lying on a shelf. They can die, they will die and they will eat all your data. That is why we have multiple copies.

Let's say you have way too much data for a single disk, or want a bit more redundancy in case one dies. One solution for this would be getting a NAS (network attached storage). Basically a NAS is a separate computer you put in your network - some can be connected directly to your computer - which got more than one disk and its primary task is storing data. You can usually do a lot more with them, but keeping your data safe is what we are after. With a NAS you would add two or more disks and configure them to duplicate data. Depending on how many disks you have and how they are configured one or two disks can die on your and your data is still there. You swap out the dead disks and a few days later the NAS is in a good state again. (When buying disks to mirror data you might want to make sure to go with different production series or even different brands. This might not be the most optimal setup, but it prevents you from manufacturing defects causing multiple disks to fail at the same time.)

Qnap or Synology sell NAS system in which you put a few disks like the Seagate IronWolf and you follow a guided setup in your browser. With ten minutes of reading the manual you should be good to configure them. Keep in mind that those are sometimes a little bit loud, so you might not want to put them on your desk.

The more geeky and flexible option would be using an old computer and an operating system like FreeNAS or Unraid. This will require a lot more time and expertise than buying a Qnap or Synology box and might also force you to learn a lot new things to maintain the system. If you are not interested in this kind of thing I would strongly advice not going down this route. If you want to build your own NAS, you can search eBay or other second hand places for older Dell Precision workstations or one from HP or IBM. They usually have workstation / server hardware like a Xeon CPU and ECC memory and they are build to be reliable. As they are a bit older you should be able to get them for something around $200 to $300.

An additional way to level up your data safety game would be adding a UPS to your setup. If you have unreliable power they will keep your system going for a bit and make sure spikes do not hit your computer or NAS. They are not a golden bullet, an UPS can also fail and will sometimes bring your computer down. Depending on where you live and how (un)reliable your power grid is, it might still be a good idea.

Online Services

Your offsite backup will most likely be with some online hoster. BackBlaze and Crashplan (mentioned above) are a one stop solution. Sign up, install their client, select what to backup and you are done. The value they provide is really good.

If you decide you do not trust those two and want a bit more control you can configured a third party hoster in Arq. Wasabi is a cheap and okay-ish provider, Hetzner is a bit more expensive but from my experience provide a lot better upload and download performance. If you want some extra functionality you might want to look at Hetzners Storage Share which is hosted NextCloud. (This would also allow you to share files with clients if this is something you do.)

Where to go from here?

I am aware this is a lot of information. But we got all the basics covered. Now the most important disclaimer: None of this will guarantee you that everything goes well and you will never lose data. But we can minimise risk and doing something is always better than doing nothing.

My personal recommendation is:

BackBlaze did an extremely good job in the past, is a reliable and cheap. An easy choice for an online provider and no need to work with command line tools - if I ever would ditch my current setup this would be my choice.

If you do not want to get a NAS, substitute it with an external drive you attach once a day or week for backups, just make sure the disk is not permanently connected to your system.

If you are operating on a really tight budget, go with BackBlaze for your offsite storage. We are talking about $66 a year, or $6 a month. It is an unlimited plan, so this will serve you well for a very long time. This is by far the best deal you will get and it is an off-site backup. No matter what happens to your hardware, your data can be recovered. This does not address deleting files and the 30 day retention period, but a budget option always comes with tradeoffs.(Uploading data to BackBlaze can take some time though, depending on the file size and your Internet connection.)

The next step would be getting a USB drive/external drive to leverage your operating systems backup tool. Depending on the size this will cost you $200 to $400, but it also reduces the risk of data loss by a good bit.

The last step would be backing up to something which is not permanently attached to your computer like a USB drive you connect once a week or a NAS. This is an additional safe guard which in my opinion moves the needle the least if you have the other two already set up, but it can save your extended lower back. There are so many considerations going into choosing a solution for this that it is hard to recommend something specific, but one combination that proved to be easy enough to set up and maintain is a Qnap NAS with Arq for backups.

The most important thing: No matter what you choose to do, make sure you test restoring files. A backup is worth nothing if restoring does not work or if data was corrupted at some point.

Please backup

Not having a backup means you already lost your data, you just do not know it yet. So please: Back up your data. Pretty please. I would prefer my wife not groaning sadly at her phone while we watch a movie.

Also please be cautious when following online discussion. People rightfully point out that USB can fail you, not having ECC memory means data corruption can happen and other things. They are correct. The chances that something like this hits you is so much smaller than you accidentally deleting a file though.

If you are still lost, if you have any questions, if you feel overwhelmed by all of this or if you are not sure what to fit into your budget I would encourage you to send me a mail at timo@screamingatmyscreen.com. I would rather exchange a few mails with you and guide you to a solution than hearing squeaky complaints about another artist having lost their work. (This is a serious offer.)