The Architecture of Assured Recovery (AR)
This page describes the underlying architecture of CA XOsoft's automatic disaster recovery replica environment testing solution, CA XOsoft™ Assured Recovery. As a point of reference, you may want to also review these sections of our website:
Why is Disaster Recovery Testing Necessary?
Over time, things change. Hardware components are replaced, software is upgraded, networks are reconfigured, data sizes grow. Even though your disaster recovery systems were thoroughly tested when installed, the dynamic nature of the IT environment makes it absolutely critical that the recovery environment continue to be tested on a regular basis.
The problem: testing is very disruptive.
Ideally, the best test would actually run the application on your standby replica server and perhaps even fail over one or more test users, but to do this in a way that does not impact either the availability of the production server or the protection that the DR system is designed to provide.
That, in a nutshell, is what CA XOsoft Assured Recovery does.
What CA XOsoft Assured Recovery Does and Why It is Unique
Until the appearance of the CA XOsoft Assured Recovery technology, there were only two possible approaches to testing a disaster recovery or high availability backup server without any impact on the production server or its protection: (1) don't test or (2) use a proprietary method that effectively simulates a test of the standby replica server, perhaps by making a replica copy of it and testing it in an isolated environment.
The second option is obviously better than the first, but it is not good enough. Testing anything other than the actual server means that you have not really tested the solution on which your disaster response depends.
The unique advantage of CA XOsoft Assured Recovery is that it allows the specific application running on the actual server to be tested. The test can be as simple as starting up the application to ensure that all the data is consistent, or as complex as, for example, bringing up Microsoft Exchange, failing over a single test user to the replica server, using the same failover technology that would be used in a real failover, testing the system by sending and receiving several emails, and then failing the test user back to the production server. Keep in mind that all of this is done while users are still working and replication is still running.
This is a true test of the replica system, the kind of test that ordinarily could be done only by stopping replication long enough for the test, and then resynchronizing the data. With AR, however, replication and high availability protection are always active and functioning. At no point in the testing process is DR protection compromised.
The Technologies that Underlie CA XOsoft Assured Recovery
CA XOsoft Assured Recovery is an innovative integration of several different technologies to provide a fundamentally new capability, namely, completely non-disruptive and automated testing of a DR replica server environment.
The technologies that underlie CA XOsoft Assured Recovery fall into four categories: replication, application management, continuous data protection, and VSS.
The key enabling technology within replication is that of spooling, which is simply the ability to store changes captured during replication in a file for some period of time before applying them. This capability is used by CA XOsoft Assured Recovery to pause the application of changes received by the replica while testing is being carried out. Spooling is, of course, done on the replica so that, should disaster strike during testing, the accumulated changes will not be lost.
CA XOsoft Assured Recovery builds on the application management that have been developed for CA XOsoft High Availability (formerly CA XOsoft WANSyncHA). These are a very complex set of capabilities that form the core of CA XOsoft Replication (formerly CA XOsoft WANSync) high availability functions. Just as in the case of HA monitoring and failover, CA XOsoft Assured Recovery must start the application being tested on the replica, verify that all the correct services are running and that data files are properly mounted and in consistent state, and then stop the application once testing is complete. Maintaining the proper order of operations and a high level of integration are critical to ensure that the process occurs in a stable fashion.
CA XOsoft's continuous data protection is a fundamentally important component of the CA XOsoft Assured Recovery architecture. In effect, at the moment that we begin spooling changes, the agent that underlies CA XOsoft's rewind technology is used to capture all changes that occur to the data on the replica server during testing. Once testing is complete, the server is rewound to precisely the state that existed when replication was paused so that accumulated changes in the spool can be applied as if nothing happened. If the production server failed while testing was in progress and a failover was triggered, failover would begin at the point immediately following application of all changes that were received.
Finally, on Windows 2003, CA XOsoft Assured Recovery can be configured to employ Microsoft's VSS technology to take a snapshot of the verified application dataset. Other backup and snapshot technologies can also be integrated easily.
The CA XOsoft Assured Recovery Process
The best way to understand how CA XOsoft Assured Recovery works is simply to consider the step by step operational flow. Presented below is a high-level view of the fully automated version of testing with default tests. Testing may be customized or performed interactively as well.
- Initiate CA XOsoft Assured Recovery
AR can be initiated manually by pressing the AR button on the management GUI or by CA XOsoft Replication itself according to a schedule that can be set when the scenario is created.
- Suspend application of journals on AR replica
The first step is to suspend application of changes received by the replica server and to accumulate them in a spool file on the replica.
- Initiate rewind agent on AR replica
The CA XOsoft engine on the AR replica sets up and runs the rewind agent, which tracks all changes made to the application data during testing in order to undo them at the end.
- Start the application
The application (Exchange, SQL Server or Oracle) is started automatically.
- Test the application
The application is verified, by default, using the same tests as are used to monitor the application in HA, including verifying that all services have correctly started and that all databases have been successfully mounted.
- Perform actions-on-success while application is running
If all tests have been successful, a user-defined script may be registered at this point to perform any actions desired in the event of a successful test that require that the application still be running. These might include an online backup, or generation of a report based on queries to the application.
- Stop the application
The application is automatically stopped just as it would be in an HA scenario during a clean failover.
- Perform additional actions-on-success when application is down
If all tests have been successful, a Microsoft VSS snapshot may be performed or a user-defined script may be registered to perform any other actions desired, including invoking other backup or snapshot technologies.
- Rewind AR replica data and resume replication
Once testing, backup and any other actions registered via scripts are complete, the AR replica is restored to precisely the same state it was in when replicated changes began to be spooled and the replication process continues normally. If the production server failed while testing was in progress and a failover was triggered, it would begin at this point.
This page only gives highlights of the architecture of CA XOsoft Assured Recovery. |