Engineering Tutorial: Practical Terraform & AWS - Part 8

Storing Consumed Events into AWS RDS

Part 1 – Objectives, Audience, Tools & Prerequisites

Part 2 – Setting Up AWS MSK

Part 3 – Using JavaScript Client To Produce and Consume Events

Part 4 – AWS Lambda Function To Produce Kafka Events

Part 5 – AWS API Gateway For Accepting Events

Part 6 – Consuming Events Using MSK Connect – S3 Bucket Sink

Part 7 – Consuming Events Using AWS ECS

Part 8 – Storing Consumed Events Into AWS RDS

Table of Contents

Introduction

In the previous part 7, we consumed Kafka events with a consumer that has been deployed as an AWC ECS Task with Fargate launch type. But the consumption logic was very limited. It only logged a message with the event payload, a logged message that we could witness inside the CloudWatch logs.

In the current part 8, we extend the consumer to communicate with a database and store the event as a row inside a relational (SQL) database. The database will be an AWS RDS with Postgres engine.

We will learn

How to use a Docker image for a Postgres DB when working locally
How to do a integration from the JavaScript code with the database
How to define an AWS RDS with Postgres engine

Important: If you follow along the steps suggested in this tutorial, you will probably be charged for the resources created in AWS. Make sure that, at the end of the learning experience, you run the necessary actions to destroy these resources, so that you are not charged for any reason. The command that we use, at the end of the process, to destroy the resources we create with this demo, is:

terraform destroy

Let’s start

Architectural Objective

We build up on what we have already designed in the previous parts. Here is our architectural objective and what is the new part we are going to add with the current post.

As you can see from the picture above, we are going to create an AWS RDS instance in the our AWS region. This instance will be accessible by our JavaScript consumer deployed into our AWS ECS Service.

Local Development

But first, let’s make it work locally.

Local Database Instance

We need to install Postgres locally. This is quite easy because we are using Docker.

Inside the folder live_listening_event_consumer we already have the file docker-compose.development.yml which sets up our local development environment.

We enhance the file to make it be like the following:

Here are the main points of this file for this part of the tutorial:

lines 1 – 2: We declare a volume named db_data. We use it later on in the db service specification to make sure that whenever the db service starts it uses a named Docker volume, rather than an unnamed one. This is a good practice to keep our Docker volumes tidy.
line 9: We make the live_listening_event_consumer service depend on the db service. This means that whenever we create a live_listening_event_consumer service, db service will be created too (if not already created). Note that the dependency requires the db service to be healthy.
lines 16 – 20: We declare the environment variables and their values, so that the consumer is able to connect to the database. We will see how these environment variables are used inside the JavaScript consumer code later on.
lines 58 – 70: We are adding a new Docker compose service with name db.
- lines 59 – 61: We build the image locally using the docker file Dockerfile.development.postgres, which will talk about immediately after.
- We specify the password for the postgres user to be postgres. We are in local development, so, we don’t really care to use a strong password.
- lines 64 – 68: We declare the healthcheck properties that will make sure docker compose knows when our database is ready to accept connections and, hence, considered healthy.
- lines 69 – 70: We declare a Docker named volume to make sure we persist the data in-between restarts of the container into a volume using an easy-to-deal-with name. We like this as a practice. If we don’t specify the volume, the container creates a local anonymous volume which is difficult to locate and work with.

Docker File To Build DB Image

Our docker-compose.development.yml file refers to a new locally-located file named Dockerfile.development.postgres. This is the file that is used to build the db service image. Here are its contents:

It is a very simple file that builds an image out of postgres:15.1 image stored in Docker hub. This version of Postgres is currently supported by Amazon RDS and we chose it to build our local Postgres image.

Not only that, we copy over the local file initialize_db.sh. This file, which is explained right after, will be used to initialize the db. It needs to be executable, that’s why on line 4 we use the chmod command to make it user-executable.

Initialize Database Script

Inside the folder live_listening_event_consumer we create the file initialize_db.sh with the following content:

This is a Bash shell script which invokes the psql Postgres cli. It uses it to first, create the database and then create the table that will be used by our consumer to store records in.

This file, is copied over to the created image for our db service.

Build Db Image

Everything is set up for our local development db service. Let’s build its image.

We make sure we are inside the folder live_listening_event_consumer and we execute the following command:

$ docker-compose -f docker-compose.development.yml build db

If everything goes well, the image should be ready to be used.

Run Local Database Server

With the image built, we can now start our local database server. This is the command to do it:

$ docker-compose -f docker-compose.development.yml up db -d

Initilialize Local Database

With the local database server up and running we can now initialize the database and the table inside.

$ docker-compose -f docker-compose.development.yml exec db ./initialize_db.sh

We see the messages

CREATE DATABASE
CREATE TABLE

which confirm the successful creation of the database and the table.

We can also execute the following command to confirm the structure of the table created:

$ docker-compose -f docker-compose.development.yml exec db psql -U postgres -d analytics_development -c '\d+ broadcasts;'

This will print something like this:

Consumer To Create Record In Database

Our local database has the table which is ready to accept records. Let’s amend our JavaScript consumer code to create a record in the database for each event that it consumes.

Install `pg` Package

Before we are able to communicate with the database from our JavaScript code, we will need a JavaScript package that offers an API to Postgres. This is the package pg.

We add it to our package dependencies with the following command:

$ yarn add pg

Amend JavaScript Consumer Code

Here is the new version of our JavaScript consumer, the file app.js inside the folder live_listening_event_consumer.

line 2: We import that pg package.
line 4: We get access to the Pool object.
lines 6 – 10: We initialize some local variables with access details to the database. We will pass thee details as values of environment variables. When in local development, these environment variables are specified in the docker-compose-development.yml file. When in production, these are going to be part of the ECS Task Definition, which we will detail about later.
lines 12 – 20: We initialize the connection pool. The pg package offers the ability to create a pool of connections that can be reused across threads.
lines 22 – 30: We define the function insertRecordIntoDb(). This is an async function and takes as input argument the broadcastId. The code is very easy to understand.
- lines 23 – 27: We build an SQL statement. This is a standard INSERT INTO statement that will insert data into the table broadcasts. Note that we don’t interpolate the value broadcastId inside the string that we build (the sqlStatement), but we define a placeholder $1 that will accept the value from the array values.
- line 29: We execute the query (this is an async call) by passing the string of the SQL statement and the values that will be used by the pool executor.
line 56: We just call the insertRecordIntoDb() function to create a record with the new broadcastId.

Build Image

Since we have amended some files that affect the Docker image of the consumer, we have to rebuild it.

$ docker-compose -f docker-compose.development.yml build live_listening_event_consumer

Run Consumer Locally

We have our database server running and our consumer with up-to-date code to insert rows into the database. It’s about time to test that everything works locally.

Bring Everything Up

Let’s bring everything up

$ docker-compose -f docker-compose.development.yml up -d

Create Kafka Topic

Let’s create the Kafka topic

docker-compose -f docker-compose.development.yml exec kafka kafka-topics --create --topic live-listening-events --if-not-exists --bootstrap-server localhost:9092

We see the message

Created topic live-listening-events.

Which proves that the topic has been successfully created.

Start Console Producer

We need a way to publish events so that we can see them being consumed by our JavaScript consumer. One way we can do that is by starting the Kafka Console Producer.

Here is the command

$ docker-compose -f docker-compose.development.yml exec kafka kafka-console-producer --topic live-listening-events --bootstrap-server localhost:9092

This gives us a prompt > at which we can type in messages that will be published to our Kafka cluster.

But, before we do, let’s go to the JavaScript consumer side.

Watch JavaScript Consumer Logs

Let’s keep an eye at the JavaScript consumer logs. We start a new terminal, because the current one is being occupied by the console producer, and we cd to the live_listening_events_consumer folder. Then we issue the following command to start watching at the consumer logs:

$ docker-compose -f docker-compose.development.yml logs --follow live_listening_event_co
nsumer

This will hold off of our terminal with printing the logs from our consumer.

Publish a Message

We go back to the producer terminal and we publish the following message:

>1
>

Witness Success Message in Consumer Logs

We then look at the consumer logs and we see something like this:

live_listening_event_consumer-live_listening_event_consumer-1  | { partition: 0, offset: '0', value: '1' }
live_listening_event_consumer-live_listening_event_consumer-1  | broadcast record created with id: 1

That is the first indication of success. We can see the two output lines corresponding to the two console.debug() statements in our app.js code.

Confirm Row Created in the Database Table

Finally, we can confirm that the row has been created in the database table with the following command. Note that we need a new terminal window inside the live_listening_event_consumer_folder to be able to execute this:

$ docker-compose -f docker-compose.development.yml exec db psql -U postgres -d analytics_development -c 'select * from broadcasts'

This will print:

 id | broadcast_id |          created_at           
----+--------------+-------------------------------
  1 | 1            | 2023-01-22 06:46:52.541197+00
(1 row)

which confirms that the consumer has successfully created the row into the database.

Produce More

We produce more messages at the producer console and we execute the command with the select * from broadcasts statement to confirm that the messages are consumed and rows are created in the database.

Everything works locally as expected! Well Done!

Deploy to AWS Account

We have tested everything locally and it works. Now it is about time to start thinking about how to deploy this to our AWS Account

Terraform Files

As we saw in the Architecture Objective diagram, the new block is the AWS RDS instance. In order to provision it using Terraform we will need a new Terraform file, plus some changes to the existing ones:

New File
- analytics_db.tf
Changed Files:
- variables.tf
- live_listening_event_consumer_ecs.tf
- client_machine.tf

We will start with new file first.

AWS RDS with Postgres Engine

The file analytics_db.tf defines the resource to created the AWS RDS with a Postgres engine:

This file actually creates two resources:

An aws_db_subnet_group with name analytics. This resource is necessary when creating the next one, the aws_db_instance, because we need to tell which VPC and subnets will have access to the db instance. As you can see, the definition is quite simple. We just need to specify the subnet_ids by referring to the VPC subnets we have already defined in the main.tf file.
An aws_db_instance with name analytics. This the main resource for the database server.

Let’s spend some more time in explaining the db instance part of the file:

line 13: We build an identifier. This is an optional attribute but it is quite useful to help us distinguish this db instance from others.
line 14: This is the compute and RAM power of our database serve instance. We chose something small, since our demo is quite primitive and doesn’t require any big power resources.
lines 16 – 18: We specify the amount and type of storage that we will need. This is absolutely minimum again.
lines 20 – 21: We specify that we want to use a Postgres engine. We also specify the version.
lines 22 – 23: We specify the username and password. See how these are going to be set up from the values of the corresponding Terraform variables. These variables are introduced by this part of the tutorial series. We will have a look at them later on.
line 24: We specify the port we want the database server to listen to. Again, the value is going to be set according to the value of the corresponding Terraform variable.
line 25: We explicitly say that that this database server will not be publicly available. We will make sure that we are going to have access only from our EC2 client machine, but not from our local development machine nor from any other client machine outside our VPC.
lines 27-28: We specify that we don’t want AWS to automatically apply major upgrades but we are ok if it applies minor upgrades.
line 29: We specify a single availability zone, since we are not building a Multi-AZ deployment for our database server.
lines 31 – 32: We specify the which VPC and subnets will have access to our database server.
line 34: We tell that we want any automated backups to be removed as soon as we remove/destroy that database server.
line 36: We remove any deletion protection so that we can use Terraform to destroy the database server.
line 38: We enable exporting of logs to AWS CloudWatch. We will export any Postgres logs, which are the logs that are created by Postgres server, as well as any logs related to upgrades that AWS will automatically apply.
line 40: We skip the final snapshot that AWS might create when a database server is destroyed.
line 42: We disable Enhanced Monitoring settings. If you want you can put a positive value on how often you want them produced.
lines 44 – 49: We tag our db instance and we make sure that our tags are forwarded to any snapshot that might be created.

New Variables

The resource aws_db_instance requires some variables. We declare them at the end of the variables.tf file. This is how we do it:

variable "db_analytics" {
  type = object({
    username = string
    password = string
    port     = number
    name     = string
  })
  nullable  = false
  sensitive = true
}

And here is the whole file as it is now:

As you can see, the variable db_analytics is of object type and it requires four properties to be set:

the username to connect to the db instance
the password to connect to the db instance
the port to connect to the db instance
the name of the database to create and work with

Note that we declare this variable as sensitive hence Terraform will not be printing its values while executing the plan.

Note also that we don’t pass the value to this variable via the terraform.tfvars file. We are going to pass the values at the command line, when we will be calling the terraform apply. See later on.

ECS Task

Our ECS Task definition file, live_listening_event_consumer_ecs.tf needs some amendments in order to pass the environment variables our consumer needs.

We need to add the following block of name value pairs inside the environment array inside the aws_ecs_task_definition resource:

{
  name  = "DB_USERNAME"
  value = "${aws_db_instance.analytics.username}"
},
{
  name  = "DB_HOST"
  value = "${aws_db_instance.analytics.address}"
},
{
  name  = "DB_PORT"
  value = "${tostring(aws_db_instance.analytics.port)}"
},
{
  name  = "DB_DATABASE"
  value = "${var.db_analytics.name}"
},
{
  name  = "DB_PASSWORD"
  value = "${var.db_analytics.password}"
}

We use the aws_db_instance output attributes for the DB_USERNAME, the DB_HOST and the DB_PORT. We use the variables for the DB_DATABASE and the DB_PASSWORD.

The full version of our live_listening_event_consumer_ecs.tf file is now:

EC2 Client Machine

The last file that we had to amend was the client_machine.tf with the resource aws_instance.client.

First, we took the connection block from the remote-exec provisioner level and we moved it to the resource level. This allows us to use the same connection settings for other blocks that require it too.

Then we create a new block, a provisioner of type file:

provisioner "file" {
  source      = "live_listening_event_consumer/initialize_db.sh"
  destination = "/home/ec2-user/initialize_db.sh"
}

This is going to copy the file defined in the source to the destination. This is how we are going to have the initialize_db.sh file inside our EC2 instance as soon as the EC2 instance is ready. Having that file inside this client machine will allow us to initialize our database and table in our newly created AWS RDS instance. This happens with the amendment we do in the remote-exec provisioner.

In the remote-exec provisioner, which is going to be executed as soon as the EC2 client machine is ready, we add the following 5 commands:

"sudo amazon-linux-extras enable postgresql14",
"sudo yum clean metadata",
"sudo yum -y install postgresql",
"chmod u+x /home/ec2-user/initialize_db.sh",
"DB_HOST=${aws_db_instance.analytics.address} DB_PORT=${aws_db_instance.analytics.port} DB_USERNAME=${aws_db_instance.analytics.username} DB_DATABASE=${var.db_analytics.name} PGPASSWORD=${var.db_analytics.password} /home/ec2-user/initialize_db.sh"

The first command will make sure that we will be able to install the Postgres version 14 client tools.
The next two commands, they actually do the installation.
The fourth command makes sure that the initialize_db.sh file has execution rights for the user ec2-user, which is the user this remote execution takes place with.
The final command is invoking the initialize_db.sh script passing the necessary environment variables and their values so that the script knows how to connect to the new created database instance and create the database and the table.

The full version of the client_machine.tf file is now this:

Terraform Apply

With all the files in place, we are ready to apply the Terraform configuration that we have prepared.

The command to do this is the following:

$ terraform apply -var 'db_analytics={username: "postgres", password: "12345678", port: "5432", name: "analytics_development"}'

Note how we pass values for the variable that does not have its value set inside the terraform.tfvars file. We use the -var option.

We type yes to let Terraform know that we want the plan to be applied and we sit back until everything finishes successfully.

At the end we will see the message:

Apply complete! Resources: 49 added, 0 changed, 0 destroyed.

Double Check Application Results

Before we test the implementation, we can first do a quick visual check that the resource of the database server has been successfully created.

We go to the AWS Management console and in particular to the RDS service dashboard. We should see one instance running:

If we click on the link msk-demo-analytics, we will go into the details:

There we can confirm details such as the Class, Region and A and the Port. One very important piece of information is the Endpoint. We will need it later to connect to the database server and confirm the number of rows inside the database table broadcasts.

Test Implementation

After the visual inspection of the results, we can now proceed in testing the actual implementation. The test will involve us sending an event over the REST endpoint and confirming the row created inside the database table broadcasts in our new AWS RDS instance.

We Send An Event

We are now ready to use the curl command from our local machine to send an event over the API Gateway.

From the API Gateway console we get the endpoint we have to call:

Then we call the curl command from our local machine as follows:

$ curl -v -X POST 'https://0l1y4sg082.execute-api.eu-west-1.amazonaws.com/v1/events' -H 'Accept: application/json' -H 'Content-type: application/json' -d '{"broadcastId": "123"}'

On successful completion of the curl command we see this:

{"status":"200"}

We Check Entry In Database Table

If everything has gone successfully, then we should have a new row inside our database table broadcasts. But, in order to confirm that we need to connect to our database using psql command line interface.

First, we have to connect to our EC2 client machine.

From the EC2 dashboard, we locate the EC2 machine and its public DNS name.

We then use that to connect with ssh from our local machine:

$ ssh -i ~/.ssh/me-ireland.pem ec2-user@ec2-34-247-194-193.eu-west-1.compute.amazonaws.com

Then we have to take the endpoint URL for our database server and use it as argument to a psql call. We get this name from the RDS db instance dashboard, as we saw earlier.

We already know the username and the password. Hence we are able to issue the following command to get the contents of the broadcasts table:

$ psql -U postgres -h msk-demo-analytics.cpz5pn90qypo.eu-west-1.rds.amazonaws.com -d analytics_development -c 'select * from broadcasts;'

The above command will ask you for the password (which is 12345678) and will print this:

 id |           broadcast_id           |          created_at           
----+----------------------------------+-------------------------------
  1 | This is event for broadcast: 123 | 2023-01-22 13:01:10.143396+00
(1 row)

Perfect! And if we use the curl command from our local machine to send another event, then we will be able to issue the above psql command again to see a second row stored in the table.

Destroy To Avoid Further AWS Costs

To avoid being charged more than you want, make sure that you call Terraform to destroy all the 49 resources that we have created:

$ terraform destroy -var 'db_analytics={username: "postgres", password: "12345678", port: "5432", name: "analytics_development"}'

Closing Note

That is the end of Part 8 and the end of the tutorial series Practical Terraform & AWS. It has been a long journey. We hope that this tutorial series was a good introduction of you to Terraform and AWS.

Contact Me

If you want to contact me to ask questions and provide feedback and comments, you are more than welcome to do it. My name is Panos and my email is panos@mixlr.com.