7 Testing the infrastructure code
In this chapter, we are continuing with building our AWS infrastructure, which is a container-based solution running in AWS Elastic Container Service (AWS ECS).
Back in the previous chapter, we set up a web application running in a container, in a container cluster using AWS ECS. Building the solution, we divided the solution into two separate files:
- my-container-infra.py - this is the main program for describing our infrastructure
- containers.py - this contains support functions to describe container-based infrastructure
Before adding new feature to our solution, we are going to take a step back and look at what we can do to organise and test the building blocks in our infrastructure-as-code solution. In particular, testing is something we want to keep in mind right from the start ideally.
We will look at how to incorporate testing into what we already have and then continue to add new infrastructure and testing as we move ahead.
Warning! The built-in testing support libraries in AWS CDK expect you to know AWS CloudFormation. Some familiarity of CloudFormation is recommended if you use these.
7.1 Different testing
What do we mean by testing? There are various aspects to consider, which include:
- That we get the infrastructure we expect to have
- That the solution and its infrastructure adhere to any security and compliance policies in place
- That the solution itself works as expected when deployed with the infrastructure
The testing aspect we will focus on here in this article is the first one, that we get the infrastructure we expect to have. This may include both actual resources that will be provisioned, that these resources have expected properties and that relations between resources are as expected. Also, that we do not introduce unexpected changes.
We will treat these tests as unit tests essentially, so they will run in our (local) development environment, and will run in the order of seconds.
7.1.1 Who writes the tests?
It depends a bit on how you have the infrastructure-as-code work organised, but the people that build and maintain (re-usable) infrastructure building blocks should write tests for those building blocks.
That may mean every developer, or specific platform developers or other groups of people. For YAML/JSON-based infrastructure using CloudFormation, there is limited support for testing and validation of the infrastructure. Frankly, the need may sometimes be limited, since you in those cases also just declare what you want to have, there may not be that much logic to test. However, when you get enough logic and conditions included with CloudFormation YAML/JSON, it can get quite messy.
If you use programming languages and AWS CDK, you get a more imperative layer to generate the declarative model. This can both make it easier to make it clear what is intended, but also make it more complex to understand exactly what infrastructure you will get.
7.1.2 Get started with writing tests
Enough preparation talk now, let us get into practical work!
For testing, you can use whichever testing framework you want with the testing support provided in AWS CDK. The examples we will build here will use pytest though.
Since we have cheated and not practiced test-driven development (TDD) right from the start, we will build some test for the existing infrastructure we have defined, before moving further with new infrastructure.
To get started, we nedd to add pytest to our project.
uv add pytest
This will add the pytest package to our project. Right now we have a very simple project structure, so we will also keep that simplicity with the test files. We will create the test files in the same root directory as the other files for now.
7.2 Infrastructure recap
Let us first re-cap what we had built so far in the two source files in our project:
my-container-infra.py
import os
import aws_cdk as cdk
from aws_cdk import (
as ec2,
aws_ec2
)import containers
= cdk.App()
app = cdk.Environment(
env =os.getenv("CDK_DEFAULT_ACCOUNT"), region=os.getenv("CDK_DEFAULT_REGION")
account
)= cdk.Stack(app, "my-container-infra", env=env)
stack
= ec2.Vpc.from_lookup(stack, "vpc", is_default=True)
vpc
= containers.add_cluster(stack, "my-test-cluster", vpc)
cluster
= {
taskconfig: containers.TaskConfig "cpu": 512,
"memory_limit_mib": 1024,
"family": "webapp",
}= {
containerconfig: containers.ContainerConfig "image": "public.ecr.aws/aws-containers/hello-app-runner:latest",
}= containers.add_task_definition_with_container(
taskdef f"taskdef-{taskconfig['family']}", taskconfig, containerconfig
stack,
)
containers.add_service(f"service-{taskconfig['family']}", cluster, taskdef, 8000, 0, True
stack,
)
app.synth()
containers.py
from typing import Literal, TypedDict # noqa
import constructs as cons
from aws_cdk import (
as ec2,
aws_ec2 as ecs,
aws_ecs as logs,
aws_logs
)
class TaskConfig(TypedDict):
256, 512, 1024, 2048, 4096]
cpu: Literal[int
memory_limit_mib: str
family:
class ContainerConfig(TypedDict):
str
image:
def add_task_definition_with_container(
scope: cons.Construct,id: str,
task_config: TaskConfig,
container_config: ContainerConfig,-> ecs.FargateTaskDefinition:
) = ecs.FargateTaskDefinition(
taskdef
scope,id,
=task_config["cpu"],
cpu=task_config["memory_limit_mib"],
memory_limit_mib=task_config["family"],
family
)
= ecs.LogDrivers.aws_logs(
logdriver =taskdef.family,
stream_prefix=logs.RetentionDays.ONE_DAY,
log_retention
)= ecs.ContainerImage.from_registry(container_config["image"])
image = f"container-{_extract_image_name(container_config['image'])}"
image_id =image, logging=logdriver)
taskdef.add_container(image_id, image
return taskdef
def add_service(
scope: cons.Construct,id: str,
cluster: ecs.Cluster,
taskdef: ecs.FargateTaskDefinition,int,
port: int,
desired_count: bool = False,
assign_public_ip: str = None,
service_name: -> ecs.FargateService:
) = service_name if service_name else ""
name = ec2.SecurityGroup(
sg
scope,f"{id}-security-group",
=f"security group for service {name}",
description=cluster.vpc,
vpc
)
sg.add_ingress_rule(ec2.Peer.any_ipv4(), ec2.Port.tcp(port))
= ecs.FargateService(
service
scope,id,
=cluster,
cluster=taskdef,
task_definition=desired_count,
desired_count=service_name,
service_name=[sg],
security_groups=ecs.DeploymentCircuitBreaker(
circuit_breaker=True,
rollback
),=assign_public_ip,
assign_public_ip
)return service
def add_cluster(scope: cons.Construct, id: str, vpc: ec2.IVpc) -> ecs.Cluster:
return ecs.Cluster(scope, id, vpc=vpc)
def _extract_image_name(image_ref):
= image_ref.split("/")[-1]
name_with_tag = name_with_tag.split(":")[0]
name return name
We will start with the functions we have defined in containers.py, add_cluster, add_service, and add_task_definition_with_container. We will use the assertions module provided with AWS CDK for your testing, and use pytest to define the tests.
7.3 Let us write the first test
To build our first tests, let us create a new file called containers_test.py and add our test code there. We will start with a single test for the add_cluster function and look at how that is built up:
import aws_cdk as cdk
from aws_cdk import (
as ec2,
aws_ec2
assertions,
)import containers
def test_ecs_cluster_defined_with_existing_vpc():
= cdk.Stack()
stack = ec2.Vpc(stack, "vpc")
vpc = containers.add_cluster(stack, "my-test-cluster", vpc=vpc)
cluster
= assertions.Template.from_stack(stack)
template "AWS::ECS::Cluster", 1)
template.resource_count_is(assert cluster.vpc is vpc
We include the assertions sub-module from AWS CDK, which has features to generate CloudFormation templates from different sources, and then perform tests on these templates.
To use this, we need to create a stack, so we import that as well. Since our add_cluster function requires some kind of construct, an identifier and a reference to a Vpc construct, we will create and provide that. The stack is possible to create with no AWS CDK App, or even an identifier, so we will just create an empty stack.
The actual test code is to call the add_cluster
function and pick up the resulting cluster object. What are we expecting the result will be?
We expect that an ECS cluster has been added to the stack we supply, and the provided VPC parameter is included with the cluster.
So testing this, we check two things:
- There is a CloudFormation AWS::ECS::Cluster resource in the stack
- The returned cluster object contains a reference to the provided Vpc object.
Note here that in CloudFormation, the ECS cluster (AWS::ECS::Cluster) does not have a reference to a VPC. This is something we can see if we look in the AWS CloudFormation documentation for the AWS::ECS::Cluster resource. This is purely something that the AWS CDK itself has added, for use later with other constructs. Besides checking that there is an AWS::ECS::Cluster in the stack, we currently do not care more about any details. So for us it suffices to check that the cluster resource is in the stack, and that we have one of it.
The AWS CDK Cluster object should have a reference to a Vpc though and it should be the one we provide to the add_cluster function. So we simply test that and use the assert
keyword to test this.
We can run the test with the command uv run pytest and see what we get:
❯ uv run pytest
========================================================== test session starts ==========================================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/eriklz/Documents/Dev/elz_repos/hands-on-iac-awscdk-code/python/step6-testing-the-infra-code
configfile: pyproject.toml
plugins: typeguard-2.13.3
collected 1 item
containers_test.py . [100%]
=========================================================== 1 passed in 2.22s ===========================================================
Success! If this would have been a proper test-driven development cycle, we would go through a different feedback loop here, though. For now, we are mainly catching up a bit.
Our first test case involves both a check that goes down into the generated CloudFormation, and another test which checks the state of a higher level construct. These are both valid types of tests to do, and to what extent you do explicit CloudFormation tests depends on the use case. If you build your own high-level construct from direct CloudFormation resources, it makes sense to do a lot of lower-level testing. If you are combining higher-level constructs, then there might not be the same need.
7.4 Task definition testing
Our next test target is the add_task_definition_with_container
function, which should create an ECS Fargate task definition and associate a container from DockerHub with it. From that description, we can think about what we could test.
- The function signature says it returns a FargateTaskDefinition. We can look for ways that the returned task definition says it will be used with Fargate in its interface.
- The task definition should have the family, cpu and memory settings we have provided
- We can also check the underlying CloudFormation AWS::ECS::TaskDefinition that is has been created and has the expected properties.
- We also need to check that the container we provide has been added to the task definition. We can check if we can use the FargateTaskDefinition object returned to check that.
- We can also check the underlying CloudFormation for an appropriate AWS::ECS::ContainerDefinition as well.
We can do testing on the higher level constructs the AWS CDK provides, or we can do more low-level testing on the generated CloudFormation.
Initially, when I first saw the assertions support functions provided for the AWS CDK, my mind was very much set on testing a lot of CloudFormation details. But I have changed my mind there. If I am building higher-level constructs from other high-level constructs in AWS CDK, the need to check explicitly generated CloudFormation is slightly limited. If you build your own constructs which use resources that map directly to CloudFormation resource, then it is very useful to check the generated CloudFormation. In other cases, that is only partially true. Look at what to can check from the construct interfaces first, and if that is not sufficient, then go to the CloudFormation-oriented tests.
I want to consider CloudFormation an implementation detail of the AWS CDK, and preferably not think about it if I can. It cannot be avoided currently though, and in practice you will have to deal with it sometimes.
7.4.1 Test for Fargate Task Definition
Let us take a first stab at the tests for add_task_definition_with_container()
and check that we have a task definition that is Fargate compatible.
def test_ecs_fargate_task_definition_defined():
= cdk.Stack()
stack = 512
cpuval = 1024
memval = "test"
familyval = {
taskcfg : containers.TaskConfig "cpu": cpuval,
"memory_limit_mib": memval,
"family": familyval
}= "public.ecr.aws/aws-containers/hello-app-runner:latest"
image = {
containercfg : containers.ContainerConfig "image": image
}= containers.add_task_definition_with_container(
taskdef f"taskdef-{taskcfg['family']}", taskcfg, containercfg
stack,
)
assert taskdef.is_fargate_compatible
assert taskdef in stack.node.children
= assertions.Template.from_stack(stack)
template "AWS::ECS::TaskDefinition", 1)
template.resource_count_is(
template.has_resource_properties("AWS::ECS::TaskDefinition",
{"RequiresCompatibilities": ["FARGATE"],
"Cpu": str(cpuval),
"Memory": str(memval),
"Family": familyval
} )
Again, in this test, we use both higher-level tests and some low-level CloudFormation tests. We can check directly that the returned task definition is Fargate compatible and we can check that it has been added to the stack without resorting to checking the CloudFormation.
We can also as before, check at CloudFormation level that one Task Definition has been added. The TaskDefinition interface does not allow us to check for the cpu, memory limit and family values, though, so in this case we would need to dive into the actual CloudFormation. The Template.has_resource_properties() function is quite useful for that. We can specify the properties we expect to find in the resource, and only the properties we are interested in. The other properties we do not need to care about. Note that the properties that we are looking for are in the AWS::ECS::TaskDefinition resource. We specify the CloudFormation name of the resource, and all the interesting CloudFormation properties and their values as a Python dict.
So we added a check for the cpu, memory and family settings to verify that those are in place.
Note: If you look at the test code, you see that the Cpu and Memory values are converted to strings. In the CloudFormation documentation examples, these values are numbers. However, according to the CloudFormation specification, the values are strings. The AWS CDK generates the direct CloudFormation resources from the specification. So if there is a discrepancy, the AWS CDK is likely handling it correctly.
7.4.2 Test for container definition
Let us add another test to check that the container definition is added to the task definition. Our function creates a task definition with a single container. The TaskDefinition construct can provide a reference to the default container definition, so it makes sense to check that this is in place - there is at least some container definition in place.
Also in this test, we are going to do it the simple way using existing interfaces and the more complex one using CloudFormation tests. The main reason for including trhe CloudFormation tests in this case are to illustrate the nested matching capabilities. I would not recoomend it if there are simpler options to use. In this case, we will still check the AWS::ECS::TaskDefinition, but we will look at a different part of the structure.
In the previous test case, we could just enter the properties we wanted to match with. Here, we will go deeper into the CloudFormation resource, so we have to be a bit more explicit about the type of matching to do.
def test_container_definition_added_to_task_definition():
= cdk.Stack()
stack = 512
cpuval = 1024
memval = "test"
familyval = {
taskcfg: containers.TaskConfig "cpu": cpuval,
"memory_limit_mib": memval,
"family": familyval,
}= "public.ecr.aws/aws-containers/hello-app-runner:latest"
image_name = {"image": image_name}
containercfg: containers.ContainerConfig
= containers.add_task_definition_with_container(
taskdef "test-taskdef", taskcfg, containercfg
stack,
)
= assertions.Template.from_stack(stack)
template = taskdef.default_container # type: ignore
containerdef: ecs.ContainerDefinition
assert containerdef is not None
assert containerdef.image_name == image_name
template.has_resource_properties("AWS::ECS::TaskDefinition",
{"ContainerDefinitions": assertions.Match.array_with(
"Image": image_name})]
[assertions.Match.object_like({
)
}, )
The functions in the Match class provide different features to use. Match.objectLike() is the same as we did implicitly at the top level in the previous test. Match.arrayWith() allows us to check that there is an element in an array that matches what we are looking for. These functions help us check that there is a container definition inside the task definition, and it refers to the image we provided.
7.5 Test the service
The last function to test here is add_service(). This is a function that ties our previously defined resources together and adds something we will spin up in a cluster and actually run. We provide a port that should be available to access our service on, and we provide a desired count for the container to run, plus tie all the pieces together.
Based on this information and what we have implemented, we can create a test like this:
def test_fargate_service_created_with_only_mandatory_properties():
= cdk.Stack()
stack = ec2.Vpc(stack, "vpc")
vpc = containers.add_cluster(stack, "test-cluster", vpc=vpc)
cluster = 512
cpuval = 1024
memval = "test"
familyval = {
taskcfg: containers.TaskConfig "cpu": cpuval,
"memory_limit_mib": memval,
"family": familyval,
}= "public.ecr.aws/aws-containers/hello-app-runner:latest"
image_name = {"image": image_name}
containercfg: containers.ContainerConfig
= containers.add_task_definition_with_container(
taskdef "test-taskdef", taskcfg, containercfg
stack,
)
= 80
port = 1
desired_count
= containers.add_service(
service "test-service", cluster, taskdef, port, desired_count
stack,
)
= assertions.Capture()
sg_capture = assertions.Template.from_stack(stack)
template
assert service.cluster == cluster
assert service.task_definition == taskdef
"AWS::ECS::Service", 1)
template.resource_count_is(
template.has_resource_properties("AWS::ECS::Service",
{"DesiredCount": desired_count,
"LaunchType": "FARGATE",
"NetworkConfiguration": assertions.Match.object_like(
{"AwsvpcConfiguration": assertions.Match.object_like(
{"AssignPublicIp": "DISABLED",
"SecurityGroups": assertions.Match.array_with([sg_capture]),
}
)
}
),
},
)
template.has_resource_properties("AWS::EC2::SecurityGroup",
{"SecurityGroupIngress": assertions.Match.array_with(
[
assertions.Match.object_like("CidrIp": "0.0.0.0/0", "FromPort": port, "IpProtocol": "tcp"}
{
)
]
)
}, )
A new feature added here is the ability to capture values from the generated CloudFormation. This will literally be whatever is at the location where the capture object has been placed. We will just use that as a placeholder for now.
If you look at the test built here, you may spot some concerns and issues with our infrastructure design. While the test passes, there are some design issues here.
7.6 Does our design suck? Wrapping up
There are several issues one may spot when testing our infrastructure design, some of which include:
- Run things as Fargate is implicit in the interface. Should it be?
- Specifying a port number to access the service through add_service() is right now fine for a single container instance only. For multiple containers (desiredCount > 1) there would need to be a load balancer.
- The design opens for traffic from everywhere, regardless of whether it uses public or private IP addresses.
- Do we have the right abstraction level for this? If our test cases become too complicated, maybe we need to find a different approach.
- No configuration or tweaking of container setup.
Also, in particular with the last test there is a lot of setup work. This is something to address as well.
We skipped some complexities by setting up a container-based environment in a cluster when we did the initial solution. This was a conscious choice then. It is easy to forget about some of these decisions later. Adding tests is a way both to validate that we get what we want, but also that our design choices for how we build our infrastructure are sound, for our use cases.
We will work with the tests and also change the design somewhat, based on what our end goals are.