The case for off-cloud backups

The case for off-cloud backups

With modern cloud solutions offering multi-zone and multi-region replications and backups, is there a need for additional backup solutions, and if so, what are the options? I will make the case for backing up to a different cloud provider, and talk about which features can increase the integrity of your data.

What exactly is an off-cloud backup?

We can think about an off-cloud backup in the same way we previously talked about off-premise backups. The goal is to be safeguarded from catastrophic incidents that affect the integrity of our backups, like fire, natural disasters, unfaithful servants and the like. For a backup to be considered off-premise it would have to be stored in a way that any incident that could possibly affect the original backup could not affect the off-premise copy of the backup. This was usually accomplished by storing a separate copy in a geographically different location, and ensuring that personnel with access to the primary site lacked permissions to the secondary site.

An off-cloud backup should likewise be shielded from any incident that may befall the original backup. This does not mean that it can not be stored in the cloud, but it should then be stored with a different cloud provider. It can also be downloaded from the cloud and stored on-premise.

Why do we need off-cloud backups?

Most cloud-providers offer first class redundancy as part of their services. You will typically have the option to store your data across several data centres within the region you have chosen (e.g west-europe), often called multi-zone redundancy. You can also activate multi-region redundancy, which will create a backup in a different region.

These two features combined should provide excellent disaster recovery. Every zone in a region will be built to be independent of the others, and with a given geographical distance between the data centres. A fire at one data centre will not put your backup at risk, and even natural disasters will normally be limited to only affect one of the zones. If we also allow for multi-region redundancy, the data will usually be stored in another country, further shielding us from unexpected political changes, riots, nuclear incidents and other scenarios that could impact all data centres in a country simultaneously.

There are however still risks that have not been mitigated with this strategy. What if the cloud provider goes bankrupt, has a massive breach of security, goes down for an extended period of time, has an employee that takes targeted action against your account, or similar. These situations are of course extreme low probability situations, but should they occur, they might threaten the existence of your business. Having a strategy to mitigate this risk might be necessary to ensure that the total risk of your business is not to high.

How to create off-cloud backups?

This will largely depend on the setup of your systems, and the kind of services you currently use with your cloud provider. It is important to understand that while most cloud provider offer similar services, they are usually different enough that you can not easily copy and paste your configurations and data. Most likely, you will end up writing some custom code to create a backup that functions optimally in your environment. In doing this, there are a number of considerations to be made.

Snapshot vs continuous backup

Is it possible to snapshot the state of your system to create a backup? If this is possible, you will most likely benefit from creating such a backup and storing it to disk at your off-cloud location. For reasonably sized databases storing a snapshot, or transaction logs to disk is easy and cheap.

If this is not possible, you will need a system that actively backs up your production data to the off-cloud backup as it is being created. If your data is in some kind of blob-storage, you would typically be able to react to events from the storage, and trigger a function that copies the blob to the off-cloud location. This will ensure that copying is only done as needed, but if the frequency of new documents is high enough, the sheer volume of events might necessitate some kind of queuing.

Handling deletions

Should a file be deleted in the backup if it is deleted in production? Clearly not, we want to guard against this exact situation. But what if there are naturally a lot of deletions in your production environment? You can't justify infinite storage of all data in backup. This will be a trade-off between the cost you can accept, against the expected time to discover any issue with your primary storage. If you are confident that you need no more than 7 days to be made aware of accidental deletions, you do not need more retention than this.

Generally, you should mark files as deleted, but not perform any deletion in the off-cloud backup. This way, you will have a state that is most likely correct should you need to restore, but if there are any erroneous deletions, you can easily fix this. If your backup cloud solution offers protections against deletions, such as retention policies, this is a perfect tool. They can ensure that deletion is impossible.

Ransomware-protection

If your production environment is hit with ransomware, it is critical to ensure that this does not propagate to your backups. More advanced ransomware actively seeks out backups to ensure that you have no choice but to pay up, and pray that the criminals will honour their promises, and decrypt your files.

We have to ensure that ransomware can not spread to the off-cloud backup. While we do not expect ransomware to be able to "jump" to our secondary cloud, a file containing the ransomware could be copied as part of the backup system. We must ensure that the information passed to the secondary cloud is only handled as binary information, and stored. This is no guarantee against bugs in the platform that might trigger the malware regardless. As the nature of ransomware is changing fast, I have no more advise, other than making sure you are keeping up with any current recommendations to prevent ransomware.

Recovery-speed

Storing large amounts of data does not come for free, and a way to reduce cost is to let your storage provider store information in a way that is slow to retrieve. Amazon offers Glacier, which is super-cheap since data is stored to tapes. Other providers have similar low-cost solutions. Retrieval times can vary, but you will most likely have data transfer time dominate the time used to restore, should you need to. It is important to have done this calculation in advance, so you know how long you will be out of service should you need the backups. If the volume stored is sufficiently large, having the backup sent to you by truck or plane might be a reasonable alternative to online transfer.

Final words

An off-cloud backup is meant as a last resort against improbable data loss that would be catastrophic to the business. The cost/benefit analysis to justify such an investment will likely not be positive if an off-cloud backup is not sufficient to restore your business. However, should you find that this is a viable option for you, there is one more benefit that you might find further down the road. This can be the start of a multicloud strategy, where your services are served by more than one cloud provider, offering true redundancy for your users.