Understanding How Infrastructure-as-Code Struggles at Scale and No-Code/Low-Code Is the Future
Infrastructure-as-code (IaC) is a very well known and popular technology for cloud infrastructure provisioning using principles of application development i.e. writing code (a programming language). While IaC is emulating application development, the technology has several limitations in the way it is built and works. These make the operationalization of IaC at scale to be challenging. In this blog we will discuss the following:
- Highlight the shortcomings of IaC as compared to the mainstream application development and its evolution.
- Outline the changes that are needed to fix the issues and show how today’s IaC will evolve to a no-code or low-code pattern.
- Highlight an example implementation with DuploCloud of such a no-code/low-code implementation and show how it is eliminating 90% of the code that needs to be written as part of native terraform by using DuploCloud Terraform plugin (provider).
IaC vs. Application Code
Although people describe IaC as writing code for infrastructure, there are some fundamental differences between the two. Let us explain how:
- Missing application centric abstractions.
Over decades, in mainstream programming we have evolved from basic imperative programming in C language to object oriented (C++) to managed code (Java/.NET) to interpreted languages (Python) and now No-Code/Low Code (AWS Honeycode). With each evolution new abstractions are added. IaC on the other hand hasn’t seen the same enhancements as it evolved from Bash/Powershell to Chef/Puppet to Terraform/Ansible in the last 20 years.
The key abstractions IaC (Terraform for example) provides are modules, providers and state management. Loosely speaking, modules are like a function with input and output parameters while providers are cloud API connectors. These are still very primitive abstractions as compared to what you can do in a programming language. There is no notion of application, microservice or a security standard that you could specify in the code. Cloud resources are exposed as-is to the DevOps engineers who now have to explicitly specify each and every resource in fine-grained detail. This leads to large code bases to create even basic deployments. Let’s take a look in the context of a network topology in AWS:
Practical Example: Every new account needs to setup a network topology in AWS. We create a VPC with a given CIDR, with the subnets split into the desired number of Availability zones. Each AZ should have one private and one public subnet. Deploy a NAT gateway in each Availability Zone for outbound traffic from the private subnets.
While a programmer would expect a one line function call like CreateInfrastructure(string Region, string CIDR, int AzCount) But, today what we need to do is write 1,000 lines of code!!
See these cloud formation templates to do the same https://github.com/widdix/aws-cf-templates/blob/master/vpc/vpc-2azs.yaml
Enhancements like Terragrunt are a step in the right direction, but still this large amount of code can be error-prone and more importantly, makes changes harder.
2. IaC neither provides built in best practices nor allows the user to add them!!
We may want to write a custom function that would perform a set of best practice validations before creation of a Virtual Machine or a network compliance checker library that can be invoked to validate all security group rules. None of this is possible in IaC.
IaC does not have user defined functions like a normal programming language
For the 2020 Cloud Threat Report released by Palo Alto Networks, it was identified that around 200,000 potential vulnerabilities in existing Infrastructure-as-code templates [https://en.wikipedia.org/wiki/Infrastructure_as_code]. To fix this problem, customers run governance tools on top of provisioned infrastructure or static analysis on IaC to catch these issues (See Cloud Custodian and Turbot) to make sure that the certain controls are met.
3. IaC is not an orchestrator: Infrastructure is not a one time build but rather a continuous operation. Every time a small change needs to be added, one needs to go through a lot of existing code, send a code review request, handle comments, test code and finally make the change. Making changes in IaC is akin to going to your database and changing the specific rows and columns and knowing exactly what to change. In order to build a fully automated infrastructure, multiple functionalities and tools have to be stitched together. Figure 2 shows an example taxonomy of DevOps functions on AWS that need to be orchestrated.
Imagine the amount of code a human has to write to build a working solution that implements this taxonomy using a language like IaC which does not even have basic abstractions and a notion of user defined functions
Hence, beyond a programming language like IaC, one needs a long running platform that remembers the complete context of a system and allows incremental changes. Today, in most organizations, DevOps engineers form this orchestration/platform layer.
4. Hard to find both development and ops skill in the same person: IaC has created this a unique skill set requirement which is a combination of programming and operations. It aims to combine the developer role and operations role into a single individual. The fact is that a large majority of IT teams do not have a programming background and it is hard to learn. The developers on the other hand can write code but they seldom understand the building blocks of an infrastructure like a VPC, VNET, Access Policy and so on. We ultimately end up in a situation where either the infrastructure is not secure or the code is complex with a lot of repetition and lack of modular structure.
How To Enhance IaC
Now that we have seen some of the limitations of IaC, let’s look into how to fix some of these for basic automation use cases. We can’t fundamentally change IaC to become a full fledged language, so we will discuss solutions that can work with and around IaC.
- Adding Application Centric Abstractions: We need an abstraction or a policy model that appeals to the most broad set of use cases, is not overly restrictive and is extensible to the rapidly evolving cloud services, security landscape and consumer needs. We take inspiration from two most popular policy models:
- Infrastructure-as-a-service (a.k.a cloud): Cloud brought us constructs like Virtual Networks for example which abstracted away the nuances of VRFs, VLANs, and routing protocols. Similarly EBS volumes abstracted away nuances of storage arrays and NAS boxes. The concept of micro-segmentation using security groups that could be applied to Virtual Machines hides nuances of port ACLS.
- Kubernetes: Here we were introduced to application level abstractions like deployment set and ingress controllers which hid underneath the details of load balancers, service discovery, service healing and high availability.
One of the abstraction that makes a lot of sense in cloud is either a notion of a project or a tenant, which represents a group of resources that work together to represent a group of micro-services and are handled by a single team. Another one can be a set of rules or policies that should always be met during deployment.
2. Automated Code Generation: Since IaC is simply a declarative infrastructure description, there is no reason why a lot of low level code can’t be auto-generated using a higher level intent or application blueprint as input. The inputs can be:
- Application blueprint that shows the list of resources or even services that need to work together
- Compliance controls that needs to be met like PCI, HIPAA, SOC2
Based on these inputs, one should be able to auto-generate low level IaC.
3. Add Orchestrator Bot: IaC cannot do active orchestration or lifecycle management of resources. To support these use cases, we need to add an orchestration layer that executes the final changes required in the infrastructure layer and constantly drives the system to goal state. Imagine an intelligent bot that can implicitly stitch together the various DevOps functions from building a network infrastructure, provisioning of resources, deployment of application, setting up logging, monitoring and alerting and providing a CI/CD framework. All compliance controls would be implicitly built into the workflow.
3. Adding Developers Self-service with guardrails: Developers want to focus on building applications and know what basic infrastructure components they need. IT wants to focus on operations and security and control the changes made at the infrastructure level. Since one can auto-generate code, we should be able to provide high level controls to developers where they can go and ask for resources like VMs, load balancers without risking any security controls. Since developers don’t want to write Terraform code and understand security, a no-code approach can satisfy the use case for both developers and DevOps teams. For Developers, the bot provides no-code self-service and secure infrastructure setup, while for the operators it provides a no-code/low-code interface to auto-program automation stack based on their guidelines and company policies.
DuploCloud No-Code Automation Platform
At DuploCloud we have built a no-code automation platform that fixes some of the limitations of IaC while auto-generating the low level IaC output for teams that need them. It is an orchestrator that receives application requirements as high level specifications and creates underlying infrastructure automatically while meeting all guidelines of the well architected framework and desired compliance standard. The platform implements all the functions outlined in Figure 2 and handles all compliance controls out-of-box. Lower level infrastructure-as-code is auto-generated.
“Using the UI one can take the no-code approach to build complex infrastructures, for larger setups DuploCloud’s terraform provider provides the higher level abstractions that reduce manual code by over 90%. “
You can watch out the demo video below to see an example of building a complex infrastructure using DuploCloud that would have otherwise taken tens of thousands of lines of IAC.
- Infrastructure as code technologies like Terraform, CloudFormation for building infrastructure automation have a very specific and limited use case.
- IaC is not really equivalent to writing code as we know in case of application development. Many key abstractions and capabilities are missing in these technologies to really manage any infrastructure at scale.
- Building a modern day cloud infrastructure using terraform is like building a complex web application using C language. As the public cloud continues to evolve at a rapid pace with hundreds of new services added each year and companies adopt cloud with hundreds of running workloads, the current automation techniques don’t scale.
The only way to build and manage secure infrastructure with agility is to add a layer of orchestration on top and using that to provision and change infrastructure resources. This layer needs to understand the high level intent and policies that need to be met.
Ultimately, we are going to move to no-code or low-code approaches to handle the scale and policy requirements. This is a change we are seeing for regular application development also and IaC is even more primitive when it comes to doing complex tasks and orchestration.
As we have seen in many other technologies, it is not a question of “IF” but “HOW FAST” will people adopt it!
Love to hear your thoughts on future trends — feel free to drop us a note at https://www.duplocloud.com/get-started.html