In a previous blog (HERE) I discussed why backups were needed in AWS and about RPO, RTO and other TLAs. This blog will compare some of the different backup options available for your infrastructure in AWS.
Roll your own
AWS has really good options for managing your environment your way, whether this be scripts using CLIs, other software using APIs, or Lambda scripts. Managing your backup environment is no different. Lambda scripts can look for tags on your instances or volumes and create a snapshot. These lambda scripts are tied in with CloudWatch events to perform the required schedules.
There are two big benefits of a “roll your own” system: you can get it to do exactly what you want … depending on your programming skills, and the only cost is your time (plus snapshot storage).
There are some drawbacks though. You probably won’t have a nice dashboard to show the status. You’ll need to ensure you have appropriate monitoring and reporting configured yourself. Finally, support for your backups is you.
AWS Data Lifecycle Management
In mid-2018, AWS released Data Lifecycle Management (DLM). You can find this under EC2 in the AWS Console. DLM provides basic EBS Volume backups and management of the associated snapshots. This is really easy to configure, just give it a policy name, tag to use, schedule name, a schedule and away you go. Simple right? Well, yes, but it is somewhat limited and there are some gotchas. Firstly, the tag needs to exist before you can create the policy (Console only. For CLI, you don’t need the tag to exist). Secondly, it runs daily or portions of a day. You can run it every 2, 3, 4, 6, 12 or 24hrs. If you want to do weekly or monthly backups, sorry.
There is also a small gotcha, DLM only checks the tags on volumes. It does not backup instances, it backs up volumes. Which does make sense when you remember that snapshot works off volumes. The biggest gotcha with this is that when using the console to select a tag, it will display tags that are attached to instances. Don’t be caught out. DLM works on volume tags only.
While there is no built in alerting with DLM, there is a CloudWatch dlm.event. Creating an event with a simple SNS trigger will get you alerting. Similarly, there is no monitoring of snapshots, but you can roll your own with CloudWatch events.
{ "source": [ "aws.dlm" ], "detail-type": [ "DLM Policy State Change" ] }
The big plus for DLM is that it’s supported by AWS. If you need help, call AWS.
DLM doesn’t do restores. To restore data, you are on your own and either need to use the console or CLI.
My view on DLM (given AWS Backup isn’t GA yet)? If your environment just requires daily backups, go for it. It’s simple, fairly basic to setup and costs nothing to use, plus it’s supported by AWS. Also, the summary page gives nice info on when and how often the backups will run and what the retention means.
AWS Backup
The newest product on the block from Amazon is AWS Backup. At the time of writing, this is only in a few regions and is still getting updates. AWS Backup is much more fully featured than DLM. Where DLM sits under the EC2 panel, AWS Backup is a service in its own right. AWS Backup can protect: DynamoDB, EBS, EFS, RDS & Storage Gateway. Note: Between finishing this post and getting a peer review, AWS Backup has been pushed out to more regions. Don’t blink when dealing with AWS.
You start off by creating a Vault and Backup Plan. Within the Plan, you configure the schedule, life cycle rules and backup vault.
Note:
Currently, only Amazon EFS file system backups can be transitioned to cold storage. The cold storage expression is ignored for the backups of Amazon Elastic Block Store (Amazon EBS), Amazon Relational Database Service (Amazon RDS), Amazon DynamoDB, and AWS Storage Gateway.
After creating a Backup Plan, you configure the resources to add to it. The best way is to use tagging. The other option is via resource ID. Note: As with DLM, you do not tag EC2 instances, but EBS volumes.
The thing that really sets AWS Backup apart is the ability to backup EFS file systems and Storage Gateway. While there are ways to backup EFS, none of them are really easy. AWS Backup turns that on its head, plus it includes lifecycle options to move the backups to cold storage.
Third Party Tools
There are several third party tools for backing up your resources in AWS. Some of those, like Avamar Virtual Edition from DellEMC, backup your data, others, like Cloud Protection Manager (CPM) from Veeam N2WS, backup your infrastructure via snapshots or similar. The options from AWS and roll-your-own with CLI/API are all infrastructure backups/snapshots, so I’ll focus on that.
CPM is basically a wrapper around what AWS provides with it’s APIs, but it does some cool and tricky stuff to give you so much more. While the basic scheduling isn’t that much different to DLM or AWS Backup, there are some advanced features with blackout windows. It’s nice, and I have used it with customers, but for most cases, the standard scheduling is what will be used. Where CPM really shines is that it does backup AND restore of instances. Technically, AWS only allows snapshots of EBS volumes, but CPM treats the instance as a whole. That means it keeps track of an instance’s VPC, subnet, IP address, security groups, etc. So, when you need to restore, it’s just a push of a button. Or, you can go into advanced features and restore to a different subnet, new IP, new security groups, etc. This is not to mention DR restores to different accounts, DR backup to different accounts, or a copy of your snapshot to S3!
CPM also does snapshots of RDS, Aurora clusters, Redshift and DynamoDB.
Wrap up
There are plenty of options when it comes to protecting your data and infrastructure within AWS. Some come at a premium, some are free, with the only cost being the data you backup. Whether you use DLM/AWS Backup or CPM, you’ll be able to recover your volumes. The big question to ask is how quickly do you need it recovered (RTO)? With CPM, I have literally recovered an EC2 instance with several EBS volumes and TBs of data in seconds! With other solutions that only focus on backup, to recover you’ll need to launch an image from the snapshot, recover and attach all the volumes (with the correct device names) and ensure you know what security groups it was part of, etc. This might be fine for test & dev, but I’m not sure it’s the best solution for a production environment. Ultimately, until someone else has a solution to backup EFS, I can see a hybrid approach being the best option. Use CPM or some other non-AWS backup tool for your mission-critical, with AWS Backup for test/dev (if you can’t afford the CPM licenses) and EFS.
The above is just the infrastructure, you’ll still need the app/DB team to check your data. In an ideal situation, you’ll also have some traditional backup software like Networker, Avamar or Veeam to protect your data for long term retention, but that’s another blog post.