Tune into our informative webinar to explore in more detail:

  • What disasters should you be planning for.
  • What Disaster Recovery is (and what it isn't)
  • Key Stages to Protecting Your Business
  • How to get started quickly with a cloud-based data protection strategy

 

WHAT IS BACKUP?

Essentially – a Backup gets it back. In many ways, a Backup is more of an insurance policy against something that may occur in the future, but with no definition of exactly what that may be:

 Backups should:

  •  protect against events such as disc corruption, human error and ransomware
  • be easy to locate
  • be reliable, allowing you to restore what you need, when you need it
  • provide granular recovery - whole VMs, VM discs and individual folders and files
  • typically be done daily

GFS Retention Policy

When configuring Backup, a schedule and a retention policy is applied. One of the most common policies, GFS, (Grandfather, Father, Son) is a tiered retention scheme based on a number of cycles, with the aim of enabling backup for several years, minimising the underlying storage space required.

  • weekly Backups are known as Sons
  • monthly Backups are known as Fathers
  • yearly Backups are known as Grandfathers

Operational /Regulatory Requirements

Regulation, Compliance and Governance retention policies, and the availability and control of historic data, are key considerations and can also dictate the geographical region in which Backups can reside.

Legal, medical and financial data, for example, needs to be kept for several years and may not be allowed to cross specific geographical boundaries.

WHAT IS DISASTER RECOVERY

DR offers a recoverability strategy for when a given disaster strikes; to be able to failover production systems and get the business back up and running very quickly, minimising downtime by having a secondary copy of a VM in a separate geographical location.

Quick Recovery Times and Minimal Data Loss

This ties in with business processes that enable productivity to continue in the event of a disaster such as power outage, comms failures and those natural events which damage or destroy production sites. The aim is to minimise data loss and to recover as quickly as possible, so that end users can continue as normal and customers can still interact with the business.

Planned Events

DR is not just for unforeseen circumstances, it can also be used for planned events, i.e.  if data centre maintenance could impact production, you could simply recover to the second site to minimise the downtime impact of that maintenance window.

Consistent Replication

DR relies on consistent replication throughout each day. The more recent the recovery point is from failover, the less data will be lost.

BACKUP AND DR – an RTO and RPO EXAMPLE

Recovery Time Objective (RTO) - How long will it take to get data back?

This measures downtime and how long it takes to restore from the incident until normal operations are available.

Recovery Point Objective (RPO) – How much data will I lose once recovered?

This concerns data loss, and how much data loss is acceptable.

This example of a typical disaster illustrates the different results achieved in terms of down time and data loss for both Backup and DR.

Your systems are backed up every night at 7pm. You also are running a DR system with continuous replication.

Disaster Strikes! At 6am, a power outage hits the building, taking down your data centre.

How do you recover?

DRaaS Scenario:

6:01am: you press “Failover” on the DR solution.

6:05am: The systems are back up in the cloud; the Recovery Time (RTO) is 4 minutes

The data replicated to the recovery site is consistent up until 5:58am – just before the power went out; your Recovery Point (RPO) is 2 minutes.

Back up Scenario

6.01 am: You go to find your Backups

The next few hours is spent finding, fixing and restarting your infrastructure (assuming the power is back on)

5 pm: A few major systems are back online – RTO on those systems is 11 hours.

The systems are restored to the last backup time - 7 pm yesterday; so RPO on those few systems is 22 hours.

The remaining systems take longer and so have much longer RTOs and RPOs.

The use of backup can therefore result in much higher RTOs and downtime; data loss is also far higher.

Based on a daily backup schedule you could suffer up to 24 hours worth of data loss.

With a good DR solution, it should be a case of simply invoking a failover, taking minutes and resulting in little downtime. Eg, as in this example, only two minutes worth of data was actually lost because the recovery point was available two minutes before the power outage.

CLOUD-BASED BACKUP AND DISASTER RECOVERY - PREPARATION

Step 1 - Understand Your Systems and Data

It's impossible to predict the exact nature of a disaster so it is best to take a holistic approach and engage with the relevant stakeholders including those from other business units such as your network and security teams. The idea isn't necessarily to categorise every system but if you cast a wide enough net you’ll capture the ones that are business critical.

Factors to consider for business critical workloads:

  • How frequently is the system accessed?
  • How often does the data in the system change?
  • How valuable is the data – can it be recreated?
  • Is it interdependent with any other systems?
  • Is the availability or data governed by any compliance requirements?

Step 2- Understanding your RTO & RPO needs

Understanding how both RTO and RPO apply to all your systems should enable you to begin categorising these systems based on criticality, i.e.

  • which ones are essential to productivity
  • the impact of losing data

Access Frequency

How often a system needs to be used is likely to govern how long you can afford to be down, as RTO is all about how long it takes to recover either from a backup or by failing over to a secondary site.

e.g.

  • a monthly invoice system, used only at the end of every month, could go a long time without too much of an impact on the business - so you can accept a higher RTO
  • a public-facing website, serving frequent customer orders, if down, is likely to be expensive to the business and may damage reputation amongst the customer base. In this case you should be looking to minimise the RTO as much as possible.

Data Update Frequency

Data update frequency drives RPO. If the system’s data is deemed valuable and is updated frequently, then you will want as low an RPO as possible in order to avoid critical data loss.

However if a system’s data is static and can be easily recreated, then you may accept a higher RPO, e.g. a file server, although accessed frequently, holds data which rarely changes, so a snapshot from the previous day may be sufficient as the data will not really have changed in the meantime.

The opposite is likely to be the case for a sales ordering system that is updated frequently. Resorting to a snapshot taken the previous day is likely to result in significant loss of critical order, sales fulfilment and accounting data.

Step 3- Rank Your Systems in Order of Importance

Begin categorising your system, starting with those critical systems that are key to your business being productive. These need to be protected with a proper, continuous or frequently replicating Disaster Recovery solution, so that a disaster doesn't result in a prolonged and unknown period of downtime and loss of critical data is minimised.

At the other end of the scale are those systems which are not critical to the business -  simple backups may be sufficient for these, with the understanding that days or weeks may pass before they are back online again.

The systems in between are less critical but still important, so it’s sensible to protect these with a DR solution, though they may be able to wait their turn to be initialised in the event of a disaster.

A final tier, such as test systems, may include databases which are entirely transient, and so could be lost and recreated with little downside to the business.

CLOUD BASED BACKUP AND DR SOLUTIONS – BEST PRACTICES

The 3-2-1 rule

This is simple to implement but effective, which is why it is commonly adopted today.

  • 3 copies of the same data
  • 2 held locally, but on different devices
  • 1 copy held offsite (for archiving or for when local copies are not available)

This also provides an air gapped copy away from production site data, and any production site problems.

Cloud based backups

Cloud offers additional benefits over the issues associated with traditional tape alternatives, eg:

  • slow access
  • locating and loading the correct tape
  • trying to find the relevant piece of data
  • reliability (tape has a lot of moving parts so needs to be stored adequately in order to avoid corruption and degradation)

For those without a second location far enough away from the first, the cloud can be the perfect location for the off-site backup copy, simple to configure and use, with the same range of features as if you were working with local backups:

  • Same levels of security that exist on your production premises - eg encryption in flight and at rest
  • Backup technology which scales easily as your production environment grows and Backup needs to change.
  • Access from anywhere as long as there is internet connectivity
  • Geographical diversity
  • Known operational expenses versus traditional up-front costs such as infrastructure, operational and support staff

Disaster Recovery as a Service (DRaaS)

As with Backup, those businesses without a second site can benefit from a Cloud based DR - known as DRaaS. For those with a second site, purely for DR, adopting a Cloud based DRaaS approach could be more cost effective, as it removes the need for a dedicated standby environment.

What to consider:

  • How you replicate your data - will it be a continuous or snapshot approach that will give you the RPOs required for your workloads?
  • Ensure you choose an Enterprise grade Cloud environment to recover to – if the recovered VMs do not perform as expected in the event of a DR or the end users cannot connect to them, productivity will suffer
  • DR should be simple to configure and to maintain, intuitive to invoke and should also be reliable – a big part of this is automation, i.e. easy failover and failback when the issue is resolved.
  • Avoid the burden of having to use multiple tools, multiple consoles and having to run numerous scripts as part of the Disaster Recovery process.
  • When your VMs are running in the recovery site you should expect to be able to manage and monitor them in the same way as VMs in production and you should also benefit from the same level of security features.
  • Ultimately the end user experience should be no different to running VMs in your own environment.
  • DRaaS should be linked to a pricing model which is visible and predictable and based mainly on operational expenses such as resources only used in an emergency, when a failover is actually invoked.

BUILDING A CLOUD-BASED DATA PROTECTION STRATEGY – CHESS AND ILAND EXPERTISE

Chess’ primary business strategy is to help you transition from on premise to Cloud-first solutions through a simple and accessible Technology-as-a-Service portfolio. Over the last 25 years Chess has developed strategic partnerships with some of the industry's leading vendors including iland, to offer world-class enterprise portfolios with affordable and predictable monthly costs.

Chess are an award-winning business holding some of the industry's highest accreditations.  They have over 250 certified technical professionals within the group and their services are trusted by 30,000 customers.

Iland have over 22 years’ experience in delivering IT services; 11 of those specialising in Cloud-based Disaster Recovery. They have gained industry recognition including Gartner and Forrester accolades which come on the back of extensive analysis of the available options in the market, technology service customer feedback, tenure and management.

Iland are the current leaders in the Magic Quadrant for DRaaS. Their datacentres are powered by VMware and Cisco and they were the first VMware partners to achieve Premier Partnership status. They currently hold two VMware Advisory Council seats and receive significant recognition from their technology partners, eg, they are Veeam’s current Cloud Partner of the Year and also Zerto Cloud Partner of the Year.  

iland currently have eight datacentres in operation across the US, EMEA and APAC to provide global reach to their customers and are also adding other data centres in other locations. All datacentres are either Tier III or Tier IV and are built on best of breed technology. They are also carrier neutral so offer flexibility around connectivity.

Iland Secure Cloud based Backup with Veeam Cloud Connect.

Features:

  • Easy, straightforward Cloud backup - Store your VMware and Hyper-V backups in a global iland cloud location
  • Cloud-based backups for your local virtual guests and data globally.
  • Store an up-to-date copy or secondary copy of virtualised applications
  • Restore files and virtual disks back to your local environment as needed.
  • Monitor and manage a Veeam Backup solution through a Secure Cloud Console.

iland secure DRaaS with Veeam.

This cost effective, snapshot based Disaster Recovery solution allows for the replication of VM’s to any global iland cloud location to spin up in case of disaster.

For existing users this is a simple upgrade from the existing backup which then utilises the familiar Veeam console.

Key features:

  • Complete DRaaS with an easy upgrade from Veeam backup
  • Full and partial failover
  • Easy configuration – all done in existing on premise Veeam availability suite console
  • End to end encryption – all data is encrypted in flight without affecting compression ratios
  • Self service testing – on demand, through existing Veeam console
  • Global cloud datacentre locations
  • 100% infrastructure SLAs

Iland Secure DRaaS with Zerto

Integrated with Zerto, this provides Cloud-based Disaster Recovery as a Service that replicates data at the hypervisor level, using any tier of storage, allowing you to keep an up to date copy of your virtualised application in iland’s Cloud, recovering and testing as needed on a self-service basis with a near 0 RPO.

Key features:

  • Virtual protection groups – allowing you to group and tier VMs for protection and ensures consistent recovery points across all of the agreed machines
  • Partial failover - by grouping VMs you can perform partial failovers of each group, reducing costs and simplifying failback
  • Speedy replication - this integrates network compression which enables closer recovery points
  • Failover assistance – the failover wizard automates and guides the failover process; issues alerts and reports on the failover. This is can then be downloaded and distributed for compliance and auditing purposes.
  • DNS failover – Disaster Recovery Management is simplified for the integration of DNS management into the Cloud console. DNS management can be performed from the same location in which you manage resources and disaster recovery workflow.

Onboarding Challenges

Key areas to consider:

  • What are your specific backup and disaster recovery requirements?
  • Is your environment ready to replicate to a cloud provider?
  • Do you have the enough bandwidth and connectivity to properly replicate your workloads to a cloud provider?
  • How are your current applications ranked in order of importance?
  • How is your current networking in your virtual environment set up?

Chess and iland expertise help address these questions, designing a solution and providing full onboarding boarding support, with an allocated project manager and deployment engineer.

CATALYST

Iland have developed Catalyst, a customisable, lightweight analysis free tool which provides a mathematically calculated inside view to rightsize your cloud environment.

With Catalyst, you can avoid costly over-provisioning or detrimental under-provisioning by knowing exactly what is needed for cloud-based DRaaS, Backup and IaaS solutions.

This example shows a logical group of 8 VMs (Prod/Web). The backup policy for these VMs has been defined: 7 x daily, 6 x weekly and 6 x monthly

 

Catalyst calculates the exact size of the storage repository needed for the specific backup policy, taking into account:

  • Full backup size when compressed
  • Incremental backup size
  • Scratch working space etc

The storage required can be recalculated in real time if the backup policy changes.

IaaS/ Disaster Recovery (Column 2)

For DRaaS and IaaS, the second column shown gives important CPU and RAM consumption figures for the selected VMs. Iland’s Pay-as-you-Go model offers the ability to predict costs, eg in the event of a Disaster Recovery failover. For those who prefer a reservation model, this gives a very good starting point when building out a virtual datacentre as you can align the reservation to the expected resource consumption of those grouped virtual machines.

Bandwidth (Column 3)

Catalyst tests bandwidth speeds and latency related to the chosen iland data centre. Based on the returned figures, the bandwidth column gives estimated seed time and incremental time for the backup policy chosen.

Catalyst’s success lies in its removal of the guesswork and the lengthy assessment of the underlying workflows that may need to be protected.

Compliance

Data is your most important asset. It’s important that you're able to hold your Cloud provider accountable for compliance, encryption and security. This minimises risk and potential financial and brand exposure.

GDPR has recently come into effect and many companies are still struggling to get to grips with it. iland specifications include BS 10012, a new specification covering personal information management. iland is one of the first UK organisations to have achieved this certification.

iland is one of only two companies to hold a Gold Star rating from the Cloud Security Alliance (the other being Microsoft).

iland are totally committed to compliance and have a dedicated compliance team to support customers, assisting with:

  • Legal agreements
  • Audit support
  • Configuration questions

Secure Cloud Console

Iland Integrated Compliance Reporting

Iland’s Secure Cloud Console has been built with a baked-in base level of a compliance and security.

Features include:

  • Global, multi-context change control
  • Automated compliance reporting
  • Integrated iland DRaaS & backup management
  • APIs and SDKs provide programming & configuration options

Networking

Moving a service physically from one region to another represents a networking challenge, especially if you are required to change internal networking or hostnames

Features:

  • The ability to manage common services easily such as VPN, NAT, firewall, load balancers, routing and DNS via a common interface.
  • The ability to build out an exact replica of your internal networking scheme in the cloud, so that all failed over applications can work seamlessly.
  • Providing stretched layer 2 functionality for partially failed over applications.

Ensuring VNs are recovered in the correct order

You need to ensure the workloads you are protecting are recovered and booted in the order you need, based on prioritisation and the interaction between the workplace.

E.g., you may need the domain controller to be the first VM recovered and operational and you’ll also need the ability to validate the success or failure of that failover and to report on it at a granular level.

Orchestration Challenges

iland cover this very well and have also taken in a step further with their own Runbook Orchestration layer. Runbook is a group of Recovery groups and it gives you the ability to categorise and group workloads and define the order in which they are recovered, so you can build multiple runbooks about around your various business failover scenarios.

You can enscript the execution of that runbook and it automatically tests DR based on the schedule of your choosing, e.g. once a quarter, giving you the option of a hands off approach to regular DR testing around your various failover scenarios. This will confirm:

  • when the failover was done
  • was it a success or failure
  • how long it took in terms of RTO
  • recovery site and details (resource pools and networks)

This contributes to confidence in the recovery process and in having the documentation to feed into the auditing and compliance process.

Management

As a customer, we have onboarded, we have the compliance needed, we have configured networking and we are replicating successfully.

How do we monitor all of this? How do we manage our recovered VMs? How do we know what our costs will be?

Iland’s Secure Cloud Console gives you visibility and management similar to your own VMware production environments so this is a big benefit for those with VMware user expertise as it should make transition to the cloud that much easier.

Iland take an API based approach to bring everything into one piece of glass such as:

  • Billing
  • Security
  • Compliance
  • VR
  • NSX Edge base networking

The centralised view allows a very granular view of historic, current and predicted costs all split down between each resource type. The same applies to performance, so you can monitor your VMs in real time as well as have access to a year’s worth of granular performance statistics.

You have complete control over your recovered VMs including the ability to create alerts – e.g. if a recovered VM is resource constrained, so that issue can be addressed.

The Cloud Console itself has a full suite of API and predefined SUK certificates that you can use to integrate those console workflows into your own tools and automation if you choose to do so.

Iland Secure Cloud Backup – Overview

The Console gives you full visibility of your cloud storage repository and how much you’ve used along with the status of the Cloud based backup jobs.

You can also extend to the Cloud repository (first see the associated cost before committing) and you can also see current and predicted billing based on a simple consumption based model.

iland Secure DRaaS

This is baked into the Cloud Console and to the continuity section and gives full visibility in real time of at the most important metric -RPO.

Failovers can be invoked in the Console, eg, if your production site is down and unavailable.   After every failover (test and live) the important recovery report is available.

Recovering VMs to an Isolated Network

This opens up some really good additional use cases – eg recovered VMs can be used to test and rehearse operation system or co-patching, giving you more confidence when applying it in live, so you know that it will be successful and how long it's likely to take.

You can also leverage compliance reporting to perform non-intrusive penetration testing against the recovery VMs knowing that they are an exact copy of your live VMs.

The recovered VMs testing and training labs may even remove the need to have dedicated test VMs in your environments, freeing up valuable resources.