ControlMonkey today added a disaster recovery module to its namesake software-as-a-service (SaaS) platform for automating the management of infrastructure-as-code tools based on open-source Terraform software.

Company CEO Aharon Twizer said that Automate Disaster Recovery makes it possible to reduce by 90% the amount of time required to restore cloud configurations, networking, security policies and other code used to configure services.

The Automate Disaster Recovery addition to the ControlMonkey platform automatically takes daily cloud snapshots of the entire infrastructure to enable rollback to any prior state using a built-in “time machine” capability to track configurations of a wide range of cloud services, including content delivery networks, DNS servers and identity access management (IAM) tools. DevOps teams can leverage the capability to roll back any cloud service configuration via a single click.

That’s critical because when it comes to disaster recovery much of the downtime organizations experience can be attributed to trying to manually reconfigure the cloud services where applications have been deployed, said Twizer. Collectively, it makes it possible for IT organizations to substantially reduce their recovery time objectives (RTO) and recover point objectives (RTO), he added.

In contrast, existing approaches to disaster recovery are mainly focused on automatically redeploying, for example, databases rather than reconfiguring the cloud infrastructure upon which they depend, noted Twizer.

The ControlMonkey platform was originally developed to apply generative artificial intelligence (AI) to generate Terraform code. The overall goal is to not only reduce the amount of time application developers spend on configuring cloud infrastructure but also improve the quality of the code being used. Many of the cybersecurity issues that organizations encounter can be traced back to misconfigurations of cloud infrastructure that, for example, left a port open through which data could be exfiltrated.

Automating the generation of that code enables, for example, a platform engineering team to more easily provide self-service capabilities to application developers, noted Twizer.

It’s not clear how aggressively DevOps teams are looking to centralize the management of the provisioning on cloud infrastructure. However, many application developers in the name of expediency assumed responsibility for it, as part of an effort to accelerate the pace at which applications are built and deployed. Unfortunately, application developers have limited cybersecurity expertise so misconfiguration of cloud infrastructure services is now rampant. Rather than requiring application developers to manually write better Terraform code, ControlMonkey is making a case for an AI platform that has been trained using best practices to consistently generate more reliable code. That capability also provides the foundation upon which disaster recovery processes can become more automated.

Each IT team will need to determine how quickly being able to recover from an outage is a priority, but the one apparent thing is that these types of incidents are becoming more common. Cloud service providers may promise high availability but, inevitably, there is some type of unforeseen issue that leads to outage. How long it takes to recover from that outage all too often comes down to how much custom code there is to reconfigure.


Share.
Leave A Reply