7 Testing the infrastructure code

In this chapter, we are continuing with building our AWS infrastructure, which is a container-based solution running in AWS Elastic Container Service (AWS ECS).

Back in the previous chapter, we set up a web application running in a container, in a container cluster using AWS ECS. Building the solution, we divided the solution into two separate files:

my-container-infra.py - this is the main program for describing our infrastructure
containers.py - this contains support functions to describe container-based infrastructure

Before adding new feature to our solution, we are going to take a step back and look at what we can do to organise and test the building blocks in our infrastructure-as-code solution. In particular, testing is something we want to keep in mind right from the start ideally.

We will look at how to incorporate testing into what we already have and then continue to add new infrastructure and testing as we move ahead.

Warning! The built-in testing support libraries in AWS CDK expect you to know AWS CloudFormation. Some familiarity of CloudFormation is recommended if you use these.

7.1 Different testing

What do we mean by testing? There are various aspects to consider, which include:

That we get the infrastructure we expect to have
That the solution and its infrastructure adhere to any security and compliance policies in place
That the solution itself works as expected when deployed with the infrastructure

The testing aspect we will focus on here in this article is the first one, that we get the infrastructure we expect to have. This may include both actual resources that will be provisioned, that these resources have expected properties and that relations between resources are as expected. Also, that we do not introduce unexpected changes.

We will treat these tests as unit tests essentially, so they will run in our (local) development environment, and will run in the order of seconds.

7.1.1 Who writes the tests?

It depends a bit on how you have the infrastructure-as-code work organised, but the people that build and maintain (re-usable) infrastructure building blocks should write tests for those building blocks.

That may mean every developer, or specific platform developers or other groups of people. For YAML/JSON-based infrastructure using CloudFormation, there is limited support for testing and validation of the infrastructure. Frankly, the need may sometimes be limited, since you in those cases also just declare what you want to have, there may not be that much logic to test. However, when you get enough logic and conditions included with CloudFormation YAML/JSON, it can get quite messy.

If you use programming languages and AWS CDK, you get a more imperative layer to generate the declarative model. This can both make it easier to make it clear what is intended, but also make it more complex to understand exactly what infrastructure you will get.

7.1.2 Get started with writing tests

Enough preparation talk now, let us get into practical work!

For testing, you can use whichever testing framework you want with the testing support provided in AWS CDK. The examples we will build here will use pytest though.

Since we have cheated and not practiced test-driven development (TDD) right from the start, we will build some test for the existing infrastructure we have defined, before moving further with new infrastructure.

To get started, we nedd to add pytest to our project.

uv add pytest

This will add the pytest package to our project. Right now we have a very simple project structure, so we will also keep that simplicity with the test files. We will create the test files in the same root directory as the other files for now.

7.2 Infrastructure recap

Let us first re-cap what we had built so far in the two source files in our project:

my-container-infra.py

import os
import aws_cdk as cdk
from aws_cdk import (
    aws_ec2 as ec2,
)
import containers

app = cdk.App()
env = cdk.Environment(
    account=os.getenv("CDK_DEFAULT_ACCOUNT"), region=os.getenv("CDK_DEFAULT_REGION")
)
stack = cdk.Stack(app, "my-container-infra", env=env)

vpc = ec2.Vpc.from_lookup(stack, "vpc", is_default=True)

cluster = containers.add_cluster(stack, "my-test-cluster", vpc)

taskconfig: containers.TaskConfig = {
    "cpu": 512,
    "memory_limit_mib": 1024,
    "family": "webapp",
}
containerconfig: containers.ContainerConfig = {
    "image": "public.ecr.aws/aws-containers/hello-app-runner:latest",
}
taskdef = containers.add_task_definition_with_container(
    stack, f"taskdef-{taskconfig['family']}", taskconfig, containerconfig
)

containers.add_service(
    stack, f"service-{taskconfig['family']}", cluster, taskdef, 8000, 0, True
)

app.synth()

containers.py

from typing import Literal, TypedDict  # noqa
import constructs as cons
from aws_cdk import (
    aws_ec2 as ec2,
    aws_ecs as ecs,
    aws_logs as logs,
)


class TaskConfig(TypedDict):
    cpu: Literal[256, 512, 1024, 2048, 4096]
    memory_limit_mib: int
    family: str


class ContainerConfig(TypedDict):
    image: str


def add_task_definition_with_container(
    scope: cons.Construct,
    id: str,
    task_config: TaskConfig,
    container_config: ContainerConfig,
) -> ecs.FargateTaskDefinition:
    taskdef = ecs.FargateTaskDefinition(
        scope,
        id,
        cpu=task_config["cpu"],
        memory_limit_mib=task_config["memory_limit_mib"],
        family=task_config["family"],
    )

    logdriver = ecs.LogDrivers.aws_logs(
        stream_prefix=taskdef.family,
        log_retention=logs.RetentionDays.ONE_DAY,
    )
    image = ecs.ContainerImage.from_registry(container_config["image"])
    image_id = f"container-{_extract_image_name(container_config['image'])}"
    taskdef.add_container(image_id, image=image, logging=logdriver)

    return taskdef


def add_service(
    scope: cons.Construct,
    id: str,
    cluster: ecs.Cluster,
    taskdef: ecs.FargateTaskDefinition,
    port: int,
    desired_count: int,
    assign_public_ip: bool = False,
    service_name: str = None,
) -> ecs.FargateService:
    name = service_name if service_name else ""
    sg = ec2.SecurityGroup(
        scope,
        f"{id}-security-group",
        description=f"security group for service {name}",
        vpc=cluster.vpc,
    )
    sg.add_ingress_rule(ec2.Peer.any_ipv4(), ec2.Port.tcp(port))

    service = ecs.FargateService(
        scope,
        id,
        cluster=cluster,
        task_definition=taskdef,
        desired_count=desired_count,
        service_name=service_name,
        security_groups=[sg],
        circuit_breaker=ecs.DeploymentCircuitBreaker(
            rollback=True,
        ),
        assign_public_ip=assign_public_ip,
    )
    return service


def add_cluster(scope: cons.Construct, id: str, vpc: ec2.IVpc) -> ecs.Cluster:
    return ecs.Cluster(scope, id, vpc=vpc)


def _extract_image_name(image_ref):
    name_with_tag = image_ref.split("/")[-1]
    name = name_with_tag.split(":")[0]
    return name

We will start with the functions we have defined in containers.py, add_cluster, add_service, and add_task_definition_with_container. We will use the assertions module provided with AWS CDK for your testing, and use pytest to define the tests.

7.3 Let us write the first test

To build our first tests, let us create a new file called containers_test.py and add our test code there. We will start with a single test for the add_cluster function and look at how that is built up:

import aws_cdk as cdk
from aws_cdk import (
    aws_ec2 as ec2,
    assertions,
)
import containers

def test_ecs_cluster_defined_with_existing_vpc():
    stack = cdk.Stack()
    vpc = ec2.Vpc(stack, "vpc")
    cluster = containers.add_cluster(stack, "my-test-cluster", vpc=vpc)

    template = assertions.Template.from_stack(stack)
    template.resource_count_is("AWS::ECS::Cluster", 1)
    assert cluster.vpc is vpc

We include the assertions sub-module from AWS CDK, which has features to generate CloudFormation templates from different sources, and then perform tests on these templates.

To use this, we need to create a stack, so we import that as well. Since our add_cluster function requires some kind of construct, an identifier and a reference to a Vpc construct, we will create and provide that. The stack is possible to create with no AWS CDK App, or even an identifier, so we will just create an empty stack.

The actual test code is to call the add_cluster function and pick up the resulting cluster object. What are we expecting the result will be?

We expect that an ECS cluster has been added to the stack we supply, and the provided VPC parameter is included with the cluster.

So testing this, we check two things:

There is a CloudFormation AWS::ECS::Cluster resource in the stack
The returned cluster object contains a reference to the provided Vpc object.

Note here that in CloudFormation, the ECS cluster (AWS::ECS::Cluster) does not have a reference to a VPC. This is something we can see if we look in the AWS CloudFormation documentation for the AWS::ECS::Cluster resource. This is purely something that the AWS CDK itself has added, for use later with other constructs. Besides checking that there is an AWS::ECS::Cluster in the stack, we currently do not care more about any details. So for us it suffices to check that the cluster resource is in the stack, and that we have one of it.

The AWS CDK Cluster object should have a reference to a Vpc though and it should be the one we provide to the add_cluster function. So we simply test that and use the assert keyword to test this.

We can run the test with the command uv run pytest and see what we get:

❯ uv run pytest
========================================================== test session starts ==========================================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/eriklz/Documents/Dev/elz_repos/hands-on-iac-awscdk-code/python/step6-testing-the-infra-code
configfile: pyproject.toml
plugins: typeguard-2.13.3
collected 1 item

containers_test.py .                                                                                                              [100%]

=========================================================== 1 passed in 2.22s ===========================================================

Success! If this would have been a proper test-driven development cycle, we would go through a different feedback loop here, though. For now, we are mainly catching up a bit.

Our first test case involves both a check that goes down into the generated CloudFormation, and another test which checks the state of a higher level construct. These are both valid types of tests to do, and to what extent you do explicit CloudFormation tests depends on the use case. If you build your own high-level construct from direct CloudFormation resources, it makes sense to do a lot of lower-level testing. If you are combining higher-level constructs, then there might not be the same need.

7.4 Task definition testing

Our next test target is the add_task_definition_with_container function, which should create an ECS Fargate task definition and associate a container from DockerHub with it. From that description, we can think about what we could test.

The function signature says it returns a FargateTaskDefinition. We can look for ways that the returned task definition says it will be used with Fargate in its interface.
The task definition should have the family, cpu and memory settings we have provided
We can also check the underlying CloudFormation AWS::ECS::TaskDefinition that is has been created and has the expected properties.
We also need to check that the container we provide has been added to the task definition. We can check if we can use the FargateTaskDefinition object returned to check that.
We can also check the underlying CloudFormation for an appropriate AWS::ECS::ContainerDefinition as well.

We can do testing on the higher level constructs the AWS CDK provides, or we can do more low-level testing on the generated CloudFormation.

Initially, when I first saw the assertions support functions provided for the AWS CDK, my mind was very much set on testing a lot of CloudFormation details. But I have changed my mind there. If I am building higher-level constructs from other high-level constructs in AWS CDK, the need to check explicitly generated CloudFormation is slightly limited. If you build your own constructs which use resources that map directly to CloudFormation resource, then it is very useful to check the generated CloudFormation. In other cases, that is only partially true. Look at what to can check from the construct interfaces first, and if that is not sufficient, then go to the CloudFormation-oriented tests.

I want to consider CloudFormation an implementation detail of the AWS CDK, and preferably not think about it if I can. It cannot be avoided currently though, and in practice you will have to deal with it sometimes.

7.4.1 Test for Fargate Task Definition

Let us take a first stab at the tests for add_task_definition_with_container() and check that we have a task definition that is Fargate compatible.

def test_ecs_fargate_task_definition_defined():
    stack = cdk.Stack()
    cpuval = 512
    memval = 1024
    familyval = "test"
    taskcfg : containers.TaskConfig = {
        "cpu": cpuval,
        "memory_limit_mib": memval,
        "family": familyval
    }
    image = "public.ecr.aws/aws-containers/hello-app-runner:latest"
    containercfg : containers.ContainerConfig = {
        "image": image
    }
    taskdef = containers.add_task_definition_with_container(
        stack, f"taskdef-{taskcfg['family']}", taskcfg, containercfg
    )

    assert taskdef.is_fargate_compatible
    assert taskdef in stack.node.children

    template = assertions.Template.from_stack(stack)
    template.resource_count_is("AWS::ECS::TaskDefinition", 1)
    template.has_resource_properties(
        "AWS::ECS::TaskDefinition",
        {
            "RequiresCompatibilities": ["FARGATE"],
            "Cpu": str(cpuval),
            "Memory": str(memval),
            "Family": familyval
        }
    )

Again, in this test, we use both higher-level tests and some low-level CloudFormation tests. We can check directly that the returned task definition is Fargate compatible and we can check that it has been added to the stack without resorting to checking the CloudFormation.

We can also as before, check at CloudFormation level that one Task Definition has been added. The TaskDefinition interface does not allow us to check for the cpu, memory limit and family values, though, so in this case we would need to dive into the actual CloudFormation. The Template.has_resource_properties() function is quite useful for that. We can specify the properties we expect to find in the resource, and only the properties we are interested in. The other properties we do not need to care about. Note that the properties that we are looking for are in the AWS::ECS::TaskDefinition resource. We specify the CloudFormation name of the resource, and all the interesting CloudFormation properties and their values as a Python dict.

So we added a check for the cpu, memory and family settings to verify that those are in place.

Note: If you look at the test code, you see that the Cpu and Memory values are converted to strings. In the CloudFormation documentation examples, these values are numbers. However, according to the CloudFormation specification, the values are strings. The AWS CDK generates the direct CloudFormation resources from the specification. So if there is a discrepancy, the AWS CDK is likely handling it correctly.

7.4.2 Test for container definition

Let us add another test to check that the container definition is added to the task definition. Our function creates a task definition with a single container. The TaskDefinition construct can provide a reference to the default container definition, so it makes sense to check that this is in place - there is at least some container definition in place.

Also in this test, we are going to do it the simple way using existing interfaces and the more complex one using CloudFormation tests. The main reason for including trhe CloudFormation tests in this case are to illustrate the nested matching capabilities. I would not recoomend it if there are simpler options to use. In this case, we will still check the AWS::ECS::TaskDefinition, but we will look at a different part of the structure.

In the previous test case, we could just enter the properties we wanted to match with. Here, we will go deeper into the CloudFormation resource, so we have to be a bit more explicit about the type of matching to do.

def test_container_definition_added_to_task_definition():
    stack = cdk.Stack()
    cpuval = 512
    memval = 1024
    familyval = "test"
    taskcfg: containers.TaskConfig = {
        "cpu": cpuval,
        "memory_limit_mib": memval,
        "family": familyval,
    }
    image_name = "public.ecr.aws/aws-containers/hello-app-runner:latest"
    containercfg: containers.ContainerConfig = {"image": image_name}

    taskdef = containers.add_task_definition_with_container(
        stack, "test-taskdef", taskcfg, containercfg
    )

    template = assertions.Template.from_stack(stack)
    containerdef: ecs.ContainerDefinition = taskdef.default_container  # type: ignore

    assert containerdef is not None
    assert containerdef.image_name == image_name

    template.has_resource_properties(
        "AWS::ECS::TaskDefinition",
        {
            "ContainerDefinitions": assertions.Match.array_with(
                [assertions.Match.object_like({"Image": image_name})]
            )
        },
    )

The functions in the Match class provide different features to use. Match.objectLike() is the same as we did implicitly at the top level in the previous test. Match.arrayWith() allows us to check that there is an element in an array that matches what we are looking for. These functions help us check that there is a container definition inside the task definition, and it refers to the image we provided.

7.5 Test the service

The last function to test here is add_service(). This is a function that ties our previously defined resources together and adds something we will spin up in a cluster and actually run. We provide a port that should be available to access our service on, and we provide a desired count for the container to run, plus tie all the pieces together.

Based on this information and what we have implemented, we can create a test like this:

def test_fargate_service_created_with_only_mandatory_properties():
    stack = cdk.Stack()
    vpc = ec2.Vpc(stack, "vpc")
    cluster = containers.add_cluster(stack, "test-cluster", vpc=vpc)
    cpuval = 512
    memval = 1024
    familyval = "test"
    taskcfg: containers.TaskConfig = {
        "cpu": cpuval,
        "memory_limit_mib": memval,
        "family": familyval,
    }
    image_name = "public.ecr.aws/aws-containers/hello-app-runner:latest"
    containercfg: containers.ContainerConfig = {"image": image_name}

    taskdef = containers.add_task_definition_with_container(
        stack, "test-taskdef", taskcfg, containercfg
    )

    port = 80
    desired_count = 1

    service = containers.add_service(
        stack, "test-service", cluster, taskdef, port, desired_count
    )

    sg_capture = assertions.Capture()
    template = assertions.Template.from_stack(stack)

    assert service.cluster == cluster
    assert service.task_definition == taskdef

    template.resource_count_is("AWS::ECS::Service", 1)
    template.has_resource_properties(
        "AWS::ECS::Service",
        {
            "DesiredCount": desired_count,
            "LaunchType": "FARGATE",
            "NetworkConfiguration": assertions.Match.object_like(
                {
                    "AwsvpcConfiguration": assertions.Match.object_like(
                        {
                            "AssignPublicIp": "DISABLED",
                            "SecurityGroups": assertions.Match.array_with([sg_capture]),
                        }
                    )
                }
            ),
        },
    )

    template.has_resource_properties(
        "AWS::EC2::SecurityGroup",
        {
            "SecurityGroupIngress": assertions.Match.array_with(
                [
                    assertions.Match.object_like(
                        {"CidrIp": "0.0.0.0/0", "FromPort": port, "IpProtocol": "tcp"}
                    )
                ]
            )
        },
    )

A new feature added here is the ability to capture values from the generated CloudFormation. This will literally be whatever is at the location where the capture object has been placed. We will just use that as a placeholder for now.

If you look at the test built here, you may spot some concerns and issues with our infrastructure design. While the test passes, there are some design issues here.

7.6 Does our design suck? Wrapping up

There are several issues one may spot when testing our infrastructure design, some of which include:

Run things as Fargate is implicit in the interface. Should it be?
Specifying a port number to access the service through add_service() is right now fine for a single container instance only. For multiple containers (desiredCount > 1) there would need to be a load balancer.
The design opens for traffic from everywhere, regardless of whether it uses public or private IP addresses.
Do we have the right abstraction level for this? If our test cases become too complicated, maybe we need to find a different approach.
No configuration or tweaking of container setup.

Also, in particular with the last test there is a lot of setup work. This is something to address as well.

We skipped some complexities by setting up a container-based environment in a cluster when we did the initial solution. This was a conscious choice then. It is easy to forget about some of these decisions later. Adding tests is a way both to validate that we get what we want, but also that our design choices for how we build our infrastructure are sound, for our use cases.

We will work with the tests and also change the design somewhat, based on what our end goals are.