Mastering Real-World DevOps with a Multi-Microservice E-Commerce Project

UpdatedFebruary 11, 2025

•32 min read

"Hello, I'm Kiran Pawar, a passionate Cloud and Devops Engineer with a strong background in cloud automation, configuration, and deployment. My journey in the world of technology has been a thrilling adventure, where I've had the privilege to work with cutting-edge tools and practices.

🚀 As a DevOps Engineer:

I specialize in automating, configuring, and deploying instances in cloud environments and data centers. My expertise extends to DevOps, GitOps, CI/CD pipeline management, HashiCorp Terraform, and containerization. I'm proficient in AWS and Linux/Unix administration, ensuring robust infrastructure and application performance.

🔧 My Tech Stack:

Front-end skills: HTML, CSS, SCSS, Tailwind CSS, Bootstrap, React, Material-UI, JavaScript DevOps toolbox: GIT, OWASP,Nexus,Trivy, Github, Gitlab, Terraform, Ansible, Docker, Kubernetes, Helm, Jenkins, Prometheus, Grafana, Argo CD, AWS EKS.

🌐 My Cloud Expertise:

I have hands-on experience managing AWS services, including EC2, S3, EBS, VPC, ELB, RDS, IAM, Route53, and more.

🔒 Networking and Security:

My skills include managing networking concepts such as TCP/IP protocols, security policies, and subnet interfacing. I have a strong understanding of infrastructure and networking, covering topics like firewalls, IP addressing, DNS, and more.

💡 What Sets Me Apart:

I bring a positive attitude, a strong work ethic, and a collaborative spirit to every project. I'm a self-starter, a fast learner, and an effective team player with strong interpersonal skills. In addition to my DevOps skills, I've developed shell scripts (Bash) for automating tasks and have proficiency in Python scripting. My ability to communicate and manage projects, along with a track record of resolving client issues, adds value to every team I work with. If you're looking for a DevOps engineer who is also well-versed in front-end technologies, feel free to connect with me. Let's explore new possibilities and create exceptional technical solutions together!"

Part of seriesDevOps CI/CD Projects

In today’s fast-paced IT industry, practical experience is the key to mastering DevOps. This course is specially curated to provide hands-on experience in real-time DevOps implementation using a highly popular E-Commerce demo project open-sourced by OpenTelemetry. This project is widely recognized as one of the best real-world applications for learning DevOps, and I personally believe it offers the most practical insights.

To make learning even more immersive, this project follows a multi-microservice architecture, where each microservice is developed in different programming languages. This will help you understand and tackle real-world challenges that arise when working with diverse tech stacks in a production environment.

What You Will Learn:

✅ Cloud Infrastructure Setup – Learn how to configure and deploy a cloud environment for DevOps implementation.
✅ Understanding the Project & SDLC – Gain in-depth knowledge of software development lifecycles in microservices-based architectures.
✅ Containerization with Docker – Learn how to package and manage applications efficiently using Docker.
✅ Docker Compose Setup – Manage multi-container applications with Docker Compose.
✅ Kubernetes for Orchestration – Deploy and manage containers at scale using Kubernetes.
✅ Infrastructure as Code (IaC) with Terraform – Automate and manage cloud infrastructure effortlessly.
✅ Resume & Interview Preparation – Learn how to add this project to your resume and confidently present it in job interviews.

By the end of this course, you will have end-to-end experience in implementing DevOps for a real-world project, making you job-ready and confident in handling DevOps.

Project Overview

A detailed documentation along with architecture diagram of the project is shared in the below link.

https://opentelemetry.io/docs/demo/

Project Architecture

Project architecture diagram and the explaination is available at the below link.

Architecture

https://opentelemetry.io/docs/demo/architecture/

Overview of microservices used in the project

https://opentelemetry.io/docs/demo/services/

AWS IAM User Creation

Step-by-Step IAM User Creation

Step 1: Log in to AWS Console

Go to AWS Management Console
Sign in with your Root Account
Navigate to IAM Dashboard

Step 2: Open IAM Users Section

Click on Users in the left navigation pane
Click Add User

Step 3: Enter User Details

Provide a User Name (e.g., devops-user)
Select AWS Credential Type:
- Access Key – Programmatic Access (for CLI, SDKs, API access)
- Password – AWS Management Console Access (for GUI access)
Click Next

Step 4: Assign Permissions

Choose how to set permissions:
- Attach Existing Policies Directly (e.g., AdministratorAccess, PowerUserAccess, ReadOnlyAccess)
- Add to a Group (recommended for better management)
- Copy from Another User
Click Next

Step 5: Add Tags (Optional)

Assign metadata like Department: DevOps, Project: E-Commerce
Click Next

Step 6: Review and Create User

Review all details carefully
Click Create User

Step 7: Download Credentials

Download the Access Key ID & Secret Access Key (if programmatic access is enabled)
Save these credentials securely (they won’t be shown again)
Click Close

AWS EC2 Instance Setup

Step 1: Log in to AWS Console

Go to AWS Management Console
Sign in with your AWS account credentials
Navigate to EC2 Dashboard

Step 2: Launch a New EC2 Instance

Click Launch Instance
Enter an instance name

Step 3: Choose an Amazon Machine Image (AMI)

Select an OS
Click Select

Step 4: Choose an Instance Type

Select a suitable instance type:
- t3.xlarge+ (For heavier workloads)
Click Next

Step 5: Configure Instance Details

Keep the default VPC & Subnet (unless customizing networking)
Enable Auto-assign Public IP (for internet access)
Click Next

Step 6: Add Storage

Default storage: 8GB but you can change that to 30 GB as we will download a lot of container images related to the project.
Click Next

Step 7: Configure Security Group

Create a new security group or use an existing one
Allow SSH (port 22) for Linux instances

Step 8: Generate or Select a Key Pair

Select Create a new key pair
Choose RSA format and download the .pem file
Store the key securely (You will need it to access your instance)
Click Launch Instance

Step 9: Connect to Your EC2 Instance

Navigate to EC2 Dashboard → Instances
Select the instance and click Connect

For Linux Instances:

Use SSH from terminal:

  ssh -i /path/to/your-key.pem ec2-user@your-instance-public-ip

Common Challenges & How to Overcome Them

1. Key Pair Issues

If you lose your .pem file, you cannot recover it
Create a new key pair and manually update the instance’s key (requires console access)

2. SSH Connection Problems

Ensure Port 22 is open in the security group
Use chmod 400 your-key.pem to set the correct permissions for the key file
Verify the correct public IP address is used

3. EC2 Instance Not Accessible via Internet

Ensure Auto-assign Public IP was enabled
Check Security Group & NACL rules
Consider using Elastic IP for a static IP address

4. High Costs Due to Running Instances

Stop or terminate unused instances
Use AWS Billing Dashboard to monitor usage
Set up AWS Budgets & Alerts

Docker Installation on Ubuntu EC2

In this lecture, we will learn how to Install Docker on ubuntu EC2.

Add Docker's official GPG key:

sudo apt-get update -y
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

Add the repository to Apt sources:

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

Install Docker

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y

Add ubuntu user to docker group

sudo usermod -aG docker $USER
newgrp docker
sudo systemctl daemon-reload

Verify Docker Installation

sudo docker run hello-world

Note: If you are planning to install Docker on any other distributions of linux or other operating systems like Windows, please follow the official documentation for steps.

Kubectl Installation on Ubuntu EC2

Download kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

Install Kubectl

sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Verify Kubectl

kubectl version --client

Note: If you are planning to install Docker on any other distributions of linux or other operating systems like Windows, please follow the official documentation for steps.

Install Terraform on Ubuntu EC2

Add Hashicorp repos

sudo apt-get update && sudo apt-get install -y gnupg software-properties-common

wget -O- https://apt.releases.hashicorp.com/gpg | \
gpg --dearmor | \
sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null

echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update -y

Install Terraform

sudo apt-get install terraform

Verify Terraform Installation

terraform -help

Run the project Locally

Just go to the directory of cloned repo and run docker compose up -d.

git clone https://github.com/imkiran13/open-telemetry-microservices-app.git

Run the project as docker compose

docker compose up -d

Expose project on EC2 instance

In this lecture we will expose the port 8080 in inbound rules of security group to allow ingress traffic to the ec2 instance.

Step 1: Log in to AWS Console

Go to AWS Management Console
Navigate to EC2 Dashboard
Click on Security Groups under Network & Security

Step 2: Select the Security Group

Identify and select the Security Group associated with your EC2 instance
Click on the Inbound rules tab
Click Edit inbound rules

Step 3: Add a New Inbound Rule

Click Add Rule
Select Type (Choose a predefined rule like HTTP, SSH, or Custom)
Select Protocol (Automatically populated based on Type)
Enter Port Range (e.g., 80 for HTTP, 22 for SSH, 8080 for custom apps)
Select Source:
- My IP (restrict access to your IP)
- Custom (specify a range like 192.168.1.0/24)
- Anywhere (0.0.0.0/0) (allows access from any IP, not recommended for production)
Click Save rules

Step 4: Verify the Changes

Ensure the rule appears in the Inbound Rules list
Test connectivity by accessing the instance on the exposed port

Access project on below url
```
  instance-public-ip::8080
```
Docker compose Overview in simple words

1. Services
- Services define containers in the Docker Compose setup.
- Each service represents a different part of an application (e.g., database, backend, frontend).

What Goes Into Services?

Image: The Docker image to use.
Build: If no image exists, specify how to build it.
Ports: Expose container ports.
Environment: Define environment variables.
Depends_on: Set service dependencies.

2. Networks

Networks allow containers to communicate with each other.
Containers in the same network can talk using service names.

What Goes Into Networks?

Driver: Defines the network type (e.g., bridge, overlay).
Attachable: Allows external containers to join.

3. Volumes

Volumes store persistent data outside the container.
Even if a container stops, the data remains.

What Goes Into Volumes?

Named Volumes: Shared storage between containers.
Bind Mounts: Maps a host directory to a container path.

Summary

Services define containers and how they run.
Networks enable communication between containers.
Volumes store data that persists beyond container restarts.

With Docker Compose, you can easily manage multi-container applications! 🚀

Docker vs Kubernetes

1. Containers Are Ephemeral

Docker runs containers, but containers are temporary and can stop anytime.
If a container crashes, Docker does not restart it automatically.

How Kubernetes Helps

Kubernetes monitors containers and restarts them if they fail.
It ensures that the required number of containers always stay running.

In Docker, you manually create and remove containers when demand changes.
If a container crashes, it must be restarted manually.

How Kubernetes Helps

Kubernetes automatically scales containers up or down based on demand.
If a container crashes, Kubernetes heals by replacing it automatically.

Docker containers get random IPs, making it hard to connect services.
If a container restarts, its IP might change, breaking communication.

How Kubernetes Helps

Kubernetes assigns a fixed DNS name to each service.
Containers can communicate using service names instead of IPs.

Summary

Docker runs containers, but they are temporary and need manual management.
Kubernetes automates scaling, healing, and service discovery.
With Kubernetes, applications are more reliable, scalable, and self-healing. 🚀

Docker Compose vs Kubernetes

Docker Compose

Management of Multi-Container Applications

Docker Compose is ideal for simple, single-host applications.
It allows you to define and run multi-container applications with a single docker-compose.yml file.
Docker Compose is best suited for development environments and smaller-scale use cases.

Kubernetes

Container Orchestration

Kubernetes is designed for complex, multi-host environments and large-scale applications.
It automates the deployment, scaling, and management of containers across a cluster of machines.

Terraform Infra Provisioning

Terraform code

Complete terraform files to create EKS in AWS VPC is available in the eks-install folder of this repo. This includes remote backend and statelocking implementation as well.

eks-install: Folder that holds the complete terraform hcl files.
backend: Folder that holds hcl files for s3 bucket and dynamodb creation.
modules: Terraform Modules for VPC and EKS.
main.tf: Main file that invokes the modules to create EKS in VPC.
variables.tf: Variables for main.tf
outputs.tf: Output values you wish to see post terraform execution, For example - VPC ID.

Useful plugins to write HCL files

Terraform
YAML
GitHub Copilot

Documentation

For any additional help, Terraform official documentation for AWS Provider is the best place.

Terraform Lifecycle

Terraform follows a well-defined lifecycle for managing infrastructure as code. The lifecycle consists of several stages that ensure resources are created, updated, or destroyed in a controlled manner.

1. Initialization (`terraform init`)

Prepares the working directory for Terraform operations.
Downloads required provider plugins and modules.
Configures the backend for storing state files.

2. Planning (`terraform plan`)

Analyzes the existing state and the desired configuration.
Shows a preview of what actions Terraform will take.
Helps in reviewing changes before applying them.

3. Application (`terraform apply`)

Executes the planned changes to create, update, or destroy resources.
Stores the updated state in the backend.

Connect Terraform with AWS to Create AWS Resources

1. Install AWS CLI

On Linux:

Download and install AWS CLI:

  curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
  unzip awscliv2.zip
  sudo ./aws/install

Verify installation:
```
  aws --version
```

On Windows:

Download the installer from AWS CLI Official Site
Run the installer and follow the setup instructions.
Verify installation:
```
  aws --version
```

On macOS:

Install using Homebrew:
```
  brew install awscli
```
Verify installation:
```
  aws --version
```

2. Configure AWS CLI

Run the following command to configure AWS credentials:
```
  aws configure
```
Provide the following details when prompted:
- AWS Access Key ID
- AWS Secret Access Key
- Default region name (e.g., us-east-1)
- Default output format (e.g., json or text)

NOTE: Above command stores AWS credentials in ~/.aws/credentials, Terraform uses those credentails to interact with AWS.

Terraform Statefile: The Brain of Terraform

What is a Terraform Statefile?

Terraform maintains a statefile (terraform.tfstate) that records information about managed infrastructure.
It acts as the single source of truth for Terraform and helps track resource mappings between Terraform configurations and real-world infrastructure.

Why is the Statefile Important?

Tracking Resources: Terraform uses the statefile to understand which resources it manages.
Performance Optimization: Instead of querying cloud providers every time, Terraform reads from the statefile to improve performance.
Dependency Management: It helps Terraform determine resource dependencies and execution order.

Managing Terraform Statefile using S3 Bucket and DynamoDB

Problem Statement

Terraform stores its statefile locally by default, which can lead to several issues:

Collaboration Challenges: When multiple users work on the same Terraform project, local statefiles create inconsistencies.
Risk of Data Loss: If the local statefile is deleted or corrupted, the infrastructure mapping is lost.
Concurrency Issues: Multiple users running Terraform commands simultaneously can cause conflicts and unintended changes.
Security Concerns: Local statefiles may contain sensitive information, making them vulnerable to unauthorized access.

To overcome these challenges, Terraform provides remote state management using an Amazon S3 bucket for storage and DynamoDB for state locking.

Managing Terraform Statefile using S3 and DynamoDB

1. Create an S3 Bucket for State Storage

Ensure the bucket is in a region close to your infrastructure for better performance.
Enable versioning to retain state history and rollback when needed.

2. Create a DynamoDB Table for State Locking

Define a table with a primary key (LockID).
Ensure that DynamoDB has provisioned throughput or is configured to use on-demand capacity.

3. Configure the Terraform Backend

Define the S3 backend in Terraform with the following:
- bucket (S3 bucket name)
- key (statefile path in S3)
- region (AWS region)
- dynamodb_table (DynamoDB table for locking)

4. Apply the Terraform Configuration

Run terraform init to initialize the backend.
Terraform will store the state in S3 and use DynamoDB to prevent race conditions.

By following this setup, Terraform state management becomes more secure, reliable, and scalable for team collaboration.

S3 Bucket for Remote backend

Problem with Local Statefile

By default, Terraform stores its statefile (terraform.tfstate) locally on the machine where Terraform is executed. This approach has several issues:

Risk of Data Loss: If the local machine crashes or the file is accidentally deleted, the statefile is lost.
Collaboration Challenges: When multiple users work on the same Terraform project, local statefiles can lead to inconsistencies and conflicts.
Concurrency Issues: If two users apply changes simultaneously, the statefile may become corrupted.
Security Risks: Local statefiles may contain sensitive data (e.g., resource IDs, credentials) that could be exposed.

How S3 Bucket Solves These Problems

Centralized Storage: The statefile is stored in a remote S3 bucket, making it accessible to all team members.
Versioning Support: S3 enables versioning, allowing rollback to previous state versions if needed.
State Consistency: When used with DynamoDB for state locking, S3 prevents simultaneous modifications, avoiding corruption.
Security & Backup: S3 supports encryption and automated backups, reducing security risks and ensuring recovery options.
Scalability: S3 can handle large statefiles efficiently without performance issues.

Dynamodb for state locking

Problem with No State Locking

When multiple users or processes run Terraform at the same time, several issues can arise:

Concurrency Conflicts: If two users apply changes simultaneously, they might overwrite each other's changes, leading to inconsistencies.
State Corruption: Without locking, partial updates can occur, resulting in a corrupted statefile.
Unreliable Infrastructure Changes: Race conditions can cause Terraform to misinterpret infrastructure state, leading to unintended modifications.

How DynamoDB Solves These Problems

State Locking: DynamoDB creates a lock entry when Terraform applies changes, preventing concurrent modifications.
Ensures Consistency: Only one process can modify the statefile at a time, ensuring consistency and preventing corruption.
Automatic Lock Release: When Terraform execution completes, the lock entry is removed, allowing the next process to proceed.
High Availability: As a managed AWS service, DynamoDB provides reliable and scalable state locking.
Cost Efficiency: With PAY_PER_REQUEST billing, DynamoDB incurs minimal cost for Terraform state locking.

Code explanation for S3 bucket and DynamoDB terraform file

1. AWS Provider Configuration

provider "aws" {
  region = "ap-south-1"
}

This block defines the AWS provider that Terraform will use.
The region attribute specifies the AWS region where resources will be created (in this case, us-west-2).

2. Creating an S3 Bucket for Terraform State Storage

resource "aws_s3_bucket" "terraform_state" {
  bucket = "demo-terraform-eks-state-bucket"

  lifecycle {
    prevent_destroy = false
  }
}

This block creates an S3 bucket named demo-terraform-eks-state-bucket.
The lifecycle rule with prevent_destroy = false allows Terraform to delete the bucket when required.

3. Enabling Versioning for the S3 Bucket

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

This block enables versioning on the S3 bucket to retain previous versions of the Terraform statefile.
Helps with rollback and recovery in case of accidental changes.

4. Enabling Server-Side Encryption for the S3 Bucket

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

This block enables encryption using the AES256 algorithm to protect statefile data at rest.
Ensures that Terraform statefiles are stored securely in the S3 bucket.

5. Creating a DynamoDB Table for State Locking

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-eks-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

This block creates a DynamoDB table named terraform-eks-state-locks.
The table is used for state locking, preventing concurrent modifications to the statefile.
billing_mode = "PAY_PER_REQUEST" ensures cost-effective pricing based on actual usage.
The primary key (hash_key) is LockID, ensuring unique entries for locks.

Terraform Modular approach benifits

What is Terraform Modular Approach?

Terraform's modular approach allows infrastructure to be broken down into reusable and maintainable components called modules. Instead of writing everything in a single configuration file, modules help in structuring code for better scalability and reusability.

Advantages of Using Modules in Terraform

1. Reusability

Modules can be reused across multiple projects, reducing redundant code.
Example: A VPC module can be used for different environments (dev, staging, production) without rewriting the VPC configuration each time.

2. Maintainability

Simplifies management by keeping infrastructure code organized.
Example: An EKS module can be separately managed, updated, and versioned without affecting other resources.

3. Scalability

Easier to scale infrastructure as modules allow independent provisioning.
Example: Scaling an EKS cluster without modifying the entire Terraform configuration.

4. Team Collaboration

Different teams can work on different modules independently.
Example: The networking team can manage the VPC module, while the DevOps team configures the EKS module.

5. Consistency and Best Practices

Modules enforce standardized infrastructure patterns.
Example: A VPC module ensures that all environments follow the same networking architecture.

Code explanation for VPC module

1. Creating a VPC

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name                                           = "${var.cluster_name}-vpc"
    "kubernetes.io/cluster/${var.cluster_name}"    = "shared"
  }
}

Creates a VPC (Virtual Private Cloud) using the CIDR block specified in var.vpc_cidr.
Enables DNS support and DNS hostnames for instances in the VPC.
Adds tags for Kubernetes cluster integration and identification.

2. Creating Private Subnets

resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name                                           = "${var.cluster_name}-private-${count.index + 1}"
    "kubernetes.io/cluster/${var.cluster_name}"    = "shared"
    "kubernetes.io/role/internal-elb"              = "1"
  }
}

Creates multiple private subnets based on private_subnet_cidrs.
Each subnet is assigned an availability zone from availability_zones.
Tags define Kubernetes cluster association and internal load balancer role.

3. Creating Public Subnets

resource "aws_subnet" "public" {
  count             = length(var.public_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.public_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name                                           = "${var.cluster_name}-public-${count.index + 1}"
    "kubernetes.io/cluster/${var.cluster_name}"    = "shared"
    "kubernetes.io/role/elb"                       = "1"
  }
}

Creates multiple public subnets based on public_subnet_cidrs.
Assigns public IP addresses to instances on launch.
Tags specify Kubernetes cluster association and public load balancer role.

4. Creating an Internet Gateway

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.cluster_name}-igw"
  }
}

Creates an Internet Gateway to enable internet access for public subnets.

5. Creating NAT Gateway and Elastic IPs

resource "aws_eip" "nat" {
  count = length(var.public_subnet_cidrs)
  domain = "vpc"

  tags = {
    Name = "${var.cluster_name}-nat-${count.index + 1}"
  }
}

resource "aws_nat_gateway" "main" {
  count         = length(var.public_subnet_cidrs)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.cluster_name}-nat-${count.index + 1}"
  }
}

Elastic IPs (EIPs) are created for the NAT gateways.
NAT gateways allow private subnet instances to access the internet securely.
Each NAT gateway is associated with a public subnet.

6. Creating Route Tables

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.cluster_name}-public"
  }
}

Public route table routes internet traffic (0.0.0.0/0) to the Internet Gateway.

resource "aws_route_table" "private" {
  count  = length(var.private_subnet_cidrs)
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.cluster_name}-private-${count.index + 1}"
  }
}

Private route tables route internet traffic through NAT gateways.

7. Associating Route Tables with Subnets

resource "aws_route_table_association" "private" {
  count          = length(var.private_subnet_cidrs)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Associates each private subnet with a private route table.

resource "aws_route_table_association" "public" {
  count          = length(var.public_subnet_cidrs)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

Associates all public subnets with the public route table.

Code explanation for EKS module

1. Creating IAM Role for EKS Cluster

resource "aws_iam_role" "cluster" {
  name = "${var.cluster_name}-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks.amazonaws.com"
      }
    }]
  })
}

Creates an IAM role for the EKS cluster.
Allows Amazon EKS to assume this role for managing cluster operations.

2. Attaching IAM Policy to Cluster Role

resource "aws_iam_role_policy_attachment" "cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

Attaches AmazonEKSClusterPolicy to the EKS cluster role.
Grants necessary permissions for EKS to manage the cluster.

3. Creating the EKS Cluster

resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  version  = var.cluster_version
  role_arn = aws_iam_role.cluster.arn

  vpc_config {
    subnet_ids = var.subnet_ids
  }

  depends_on = [    aws_iam_role_policy_attachment.cluster_policy  ]
}

Creates an EKS cluster with the specified name and version.
Uses the IAM role created earlier for authorization.
Associates the cluster with VPC subnets for networking.
Depends on the IAM policy attachment to ensure permissions are set.

4. Creating IAM Role for Node Group

resource "aws_iam_role" "node" {
  name = "${var.cluster_name}-node-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })
}

Creates an IAM role for the EKS worker nodes.
Allows EC2 instances to assume this role.

5. Attaching IAM Policies to Node Role

resource "aws_iam_role_policy_attachment" "node_policy" {
  for_each = toset([
    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  ])

  policy_arn = each.value
  role       = aws_iam_role.node.name
}

Attaches three AWS managed policies to the worker node role:
1. AmazonEKSWorkerNodePolicy - Grants permissions to join and interact with the EKS cluster.
2. AmazonEKS_CNI_Policy - Allows the container network interface to operate properly.
3. AmazonEC2ContainerRegistryReadOnly - Grants access to pull container images from ECR.

6. Creating EKS Node Group

resource "aws_eks_node_group" "main" {
  for_each = var.node_groups

  cluster_name    = aws_eks_cluster.main.name
  node_group_name = each.key
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.subnet_ids

  instance_types = each.value.instance_types
  capacity_type  = each.value.capacity_type

  scaling_config {
    desired_size = each.value.scaling_config.desired_size
    max_size     = each.value.scaling_config.max_size
    min_size     = each.value.scaling_config.min_size
  }

  depends_on = [    aws_iam_role_policy_attachment.node_policy  ]
}

Creates an EKS Node Group to run worker nodes within the cluster.
Assigns node IAM role for permissions.
Uses subnets for network placement.
Defines instance types and capacity type (On-Demand or Spot instances).
Configures auto-scaling with desired_size, min_size, and max_size.
Depends on the IAM policy attachment to ensure permissions are applied.

Creating infrastructure on aws

Log in to the AWS console using the devops-user IAM credentials. Ensure you have configured the AWS CLI with the access key and secret key on your local.

Open visual studio code and clone the respository https://github.com/imkiran13/open-telemetry-microservices-app.git Go to terraform/backend folder from cloned repository and perform below commands to create s3 bucket and dynamo-db table first

Create s3 backend and Dynamo-DB

terraform init
terraform validate
terraform plan --auto-approve
terraform apply --auto-approve

Verify s3 bucket and dynamo db from aws console

Go to terraform folder from cloned repository and perform below commands

Create EKS Cluster and VPC

terraform init
terraform validate
terraform plan --auto-approve
terraform apply --auto-approve

Verify EKS Cluster and VPC from aws console

How to connect to 1 or many Kubernetes clusters from command line ?

Connect to devops-demo instance

verify docker kubectl eksctl aws-cli unzip helm installed on it

configure aws using aws configure command

#Install HELM
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh

Update kubeconfig file with EKS cluster we created

aws eks update-kubeconfig --region ap-south-1 --name my-eks-cluster

Check configs

 kubectl config view

Check current context

 kubectl config current-context

Check available nodes

 kubectl get nodes

Kubernetes implementation overview

Service Account

Deployment

Service

Deploy Application to kubernetes

Load balancer service

Difference betweeen Load balancer and ingress service

Deploy ingress and ingress controller

Custom Domain mapping

Service Account

A Service Account is a special type of account used by applications, workloads, or services to interact with APIs and resources securely without human intervention. It is commonly used in Kubernetes, AWS, GCP, and other cloud platforms. Used within a Kubernetes cluster for pods to access the API server securely.Created automatically for every namespace (default service account).Custom service accounts can be created and linked to pods.Associated with Role-Based Access Control (RBAC).

Deployment

A Deployment in Kubernetes is used to manage and scale applications by defining the desired state of pods and ensuring their availability. It provides self-healing, rolling updates, and rollback capabilities.

Key Features

Self-Healing: Automatically restarts failed pods.
Rolling Updates: Gradually updates pods without downtime.
Rollback: Reverts to a previous working version if needed.
Scaling: Adjusts the number of pod replicas dynamically.

Service

A Service in Kubernetes is an abstraction that exposes a set of pods as a network-accessible endpoint. Services enable communication between different parts of an application, ensuring that requests reach the correct pod, even if pods are dynamically created or deleted.

Types of Kubernetes Services

ClusterIP (Default) → Internal communication within the cluster.
NodePort → Exposes the service on a static port on each node.
LoadBalancer → Uses a cloud provider’s external load balancer.
ExternalName → Maps a service to an external DNS name.

Deploy Kuberneets Manifest

Go inside kubernetes folder from cloned repository https://github.com/imkiran13/open-telemetry-microservices-app.git

Apply manifest file to create deployments and service's all at once

kubectl apply -f complete-deploy.yaml

Verify pods and services running

kubectl get pods
kubectl get svc

Edit Frontendproxy Service and change Type to LoadBalancer

kubectl edit service opentelemetry-demo-frontendproxy

Wait for 5 mins to check LoadBalancer Service

kubectl get svc

Accessing project using LoadBalancer Service

a3f548cbafdbd4de3a05155920993c64-1814125010.ap-south-1.elb.amazonaws.com:8080

Disadvantages of LoadBalancer service

While LoadBalancer services provide external access to applications, they come with several disadvantages:

1. Cloud-Dependent (Not for On-Premises)

LoadBalancer works only in cloud environments (AWS, GCP, Azure).
Does not work on bare-metal or on-prem Kubernetes unless a third-party solution like MetalLB is used.

2. Expensive in Cloud Environments

Each LoadBalancer service creates a new cloud load balancer, which increases infrastructure costs.
Example:
- AWS creates an ELB (Elastic Load Balancer).
- GCP creates a Cloud Load Balancer.
Running multiple services results in multiple LoadBalancers, leading to high expenses.

📌 Cost-Effective Alternative: Use an Ingress Controller instead of multiple LoadBalancers.

3. Limited Traffic Routing & Control

LoadBalancer only supports Layer 4 (TCP/UDP) routing.
No path-based or host-based routing.
No built-in support for SSL termination, authentication, or rate limiting.

📌 Better Alternative: Use an Ingress Controller (e.g., NGINX, Traefik) for advanced routing.

4. Slow Provisioning & Cloud Dependency

Creating a LoadBalancer depends on cloud provider APIs, which can take time.
If the cloud provider faces issues, the service may become unavailable.

📌 Mitigation: Implement readiness probes and multi-region failover.

5. Not Ideal for Multi-Service Applications

If an application has multiple microservices, each exposed service requires a separate LoadBalancer.
This results in higher costs and complexity.

📌 Better Alternative: Use Ingress to expose multiple services with a single LoadBalancer.

6. No Fine-Grained Security Controls

Exposes services directly to the internet, making them vulnerable to attacks.
Requires manual security configurations such as:
- Network Policies
- Web Application Firewalls (WAF)
- Restricting access using Security Groups

📌 Security Best Practice: Use a reverse proxy (Ingress) with authentication.

Conclusion: When to Avoid LoadBalancer?

❌ If running on-premises Kubernetes → Use MetalLB instead.
❌ If exposing multiple services → Use Ingress with one LoadBalancer.
❌ If needing advanced routing (path-based, HTTPS, rate-limiting) → Use an Ingress Controller.

Ingress vs. Ingress Controller in Kubernetes

Feature	Ingress	Ingress Controller
Definition	A Kubernetes resource that defines rules for routing external traffic to services	A software component that implements Ingress rules and processes traffic
Function	Specifies rules for host-based and path-based routing	Listens for Ingress resources and routes traffic accordingly
Layer Support	Layer 7 (HTTP/HTTPS)	Layer 7 (HTTP/HTTPS)
Traffic Handling	Only defines routing rules	Actively manages traffic based on rules
HTTPS/TLS Support	Requires an Ingress Controller to terminate TLS	Provides TLS termination, authentication, rate-limiting
Implementation	Defined as a YAML resource in Kubernetes	Runs as a separate Pod/Deployment in the cluster
Dependency	Requires an Ingress Controller to function	Works independently and processes Ingress rules
Example	NGINX Ingress resource (YAML file)	NGINX, Traefik, HAProxy, AWS ALB Ingress Controller

Understanding the Relationship

Ingress is just a set of rules; it doesn’t handle traffic by itself.
Ingress Controller is the actual software that enforces Ingress rules and routes traffic.
Without an Ingress Controller, an Ingress resource won't work.

Comparison: LoadBalancer vs. Ingress in Kubernetes

Feature	LoadBalancer	Ingress Controller
Layer Support	Layer 4 (TCP/UDP)	Layer 7 (HTTP/HTTPS)
Traffic Routing	Basic, directs traffic to a single service	Advanced, supports path-based and host-based routing
Cost	Expensive (each service gets a separate cloud LB)	Cost-effective (one LB for multiple services)
Security	No built-in authentication or TLS termination	Supports TLS termination, authentication, and security features
Multi-Service Support	Requires separate LoadBalancer for each service	One LoadBalancer for multiple services
Cloud Dependency	Works only on cloud platforms (AWS, GCP, Azure)	Works on both cloud and on-premises (Minikube, Bare Metal, etc.)
SSL/HTTPS Support	Not built-in, requires manual configuration	Built-in TLS/SSL support with Cert-Manager
Rate Limiting & Load Balancing	Not supported natively	Supports rate limiting and custom load-balancing algorithms
Flexibility	Static public IP for each service	Dynamic routing with hostnames and subdomains
Performance	Directly exposes service, reducing one layer of latency	Adds an extra layer (Ingress Controller), but optimizes routing

When to Use LoadBalancer?

✅ When you need direct access to a single service.
✅ When using TCP/UDP services (e.g., databases, game servers).
✅ When running on cloud platforms (AWS, GCP, Azure) and don't mind extra costs.

When to Use Ingress Controller?

✅ When you need cost-effective exposure of multiple services.
✅ When you want path-based and host-based routing.
✅ When you require HTTPS/TLS termination.
✅ When deploying on-prem Kubernetes (Minikube, Bare Metal).

Example Scenarios

Scenario	Best Option
Exposing a single backend API	LoadBalancer
Running a microservices-based app	Ingress
Hosting multiple websites on Kubernetes	Ingress
Deploying a database externally	LoadBalancer
Cost-effective public exposure of services	Ingress

Set up the AWS ALB Ingress Controller as an EKS Add-on.

📌 What is OIDC in EKS?

AWS EKS uses OIDC (OpenID Connect) to authenticate Kubernetes workloads with AWS IAM.
This allows Kubernetes pods to assume IAM roles without needing long-term credentials.

🔹 Step 1: Setup OIDC Connector for EKS

To enable IAM authentication for AWS services in Kubernetes, you need an OIDC (OpenID Connect) provider.

1️⃣ Export Your EKS Cluster Name

export cluster_name=my-eks-cluster

2️⃣ Get the OIDC ID of Your Cluster

oidc_id=$(aws eks describe-cluster --name $cluster_name --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)

3️⃣ Check If an IAM OIDC Provider Already Exists

aws iam list-open-id-connect-providers | grep $oidc_id | cut -d "/" -f4

If you see output, OIDC is already configured.
If not, continue with the next step.

4️⃣ Associate IAM OIDC Provider with EKS

eksctl utils associate-iam-oidc-provider --cluster $cluster_name --approve

✅ Now, your EKS cluster is associated with an IAM OIDC provider.

🔹 Step 2: Create IAM Role and Policy for ALB Ingress Controller

The AWS ALB Controller needs an IAM policy to manage ALBs and route traffic.

1️⃣ Download the AWS Load Balancer Controller IAM Policy

curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.5.4/docs/install/iam_policy.json

2️⃣ Create the IAM Policy

aws iam create-policy \
    --policy-name AWSLoadBalancerControllerIAMPolicy \
    --policy-document file://iam_policy.json

This creates a policy named AWSLoadBalancerControllerIAMPolicy with necessary permissions.

3️⃣ Create an IAM Role and Service Account for ALB Controller

eksctl create iamserviceaccount \
  --cluster=$cluster_name \
  --namespace=kube-system \
  --name=aws-load-balancer-controller \
  --role-name AmazonEKSLoadBalancerControllerRole \
  --attach-policy-arn=arn:aws:iam::<your-aws-account-id>:policy/AWSLoadBalancerControllerIAMPolicy \
  --approve  \
  --override-existing-serviceaccounts

Replace <your-aws-account-id> with your AWS account ID.
This command creates an IAM role and links it to a Kubernetes service account.

🔹 Step 3: Deploy ALB Ingress Controller using Helm

The ALB Ingress Controller is deployed via Helm.

1️⃣ Add the Helm Repository for EKS

helm repo add eks https://aws.github.io/eks-charts

2️⃣ Update Helm Repository

helm repo update

3️⃣ Install the AWS Load Balancer Controller

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=$cluster_name \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set region=<your-region> \
  --set vpcId=<your-vpc-id>

Replace <your-region> with your AWS region (e.g., us-east-1).
Replace <your-vpc-id> with your VPC ID.

✅ ALB Ingress Controller is now installed in Kubernetes.

🔹 Step 4: Verify ALB Ingress Controller Deployment

Check if the ALB Ingress Controller is running:

kubectl get deployment -n kube-system aws-load-balancer-controller
kubectl get pods -n kube-system

If everything is correct, the output should show 1 or more replicas running.

🔹 Step 5: Troubleshooting - LoadBalancer Address Not Found

If you don’t see the LoadBalancer address while running:

kubectl get ingress -n <namespace>

You need to ensure IAM permissions for ALB operations.

1️⃣ Verify IAM Policy Has Required Permissions

Run this command to check if elasticloadbalancing:DescribeListenerAttributes is present:

aws iam get-policy-version \
    --policy-arn arn:aws:iam::<your-aws-account-id>:policy/AWSLoadBalancerControllerIAMPolicy \
    --version-id $(aws iam get-policy --policy-arn arn:aws:iam::<your-aws-account-id>:policy/AWSLoadBalancerControllerIAMPolicy --query 'Policy.DefaultVersionId' --output text)

2️⃣ If Permission is Missing, Update IAM Policy

First, download the current IAM policy:

aws iam get-policy-version \
    --policy-arn arn:aws:iam::<your-aws-account-id>:policy/AWSLoadBalancerControllerIAMPolicy \
    --version-id $(aws iam get-policy --policy-arn arn:aws:iam::<your-aws-account-id>:policy/AWSLoadBalancerControllerIAMPolicy --query 'Policy.DefaultVersionId' --output text) \
    --query 'PolicyVersion.Document' --output json > policy.json

Edit policy.json and add the following permission:

{
  "Effect": "Allow",
  "Action": "elasticloadbalancing:DescribeListenerAttributes",
  "Resource": "*"
}

3️⃣ Update the IAM Policy

aws iam create-policy-version \
    --policy-arn arn:aws:iam::<your-aws-account-id>:policy/AWSLoadBalancerControllerIAMPolicy \
    --policy-document file://policy.json \
    --set-as-default

✅ Now, your IAM policy should allow ALB to work properly.

🔹 Step 6: Deploy an Application Using ALB Ingress

Change service type of opentelemetry-demo-frontendproxy service from loadbalancer to NodePort

kubectl edit svc opentelemetry-demo-frontendproxy

we can see loadbalancer removed

Now, deploy an application and expose it using an Ingress resource.

Go to cloned repository kubernetes/frontendproxy directory

create file vim ingress.yaml

vim ingress.yaml

Example: ALB Ingress YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend-proxy
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/group.name: my-ingress-group
spec:
  ingressClassName: alb
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: opentelemetry-demo-frontendproxy
            port:
              number: 8080

Apply the Ingress Resource

kubectl apply -f ingress.yaml

Check Ingress and ALB Details

kubectl get ingress frontend-proxy -o wide

You should now see an AWS ALB hostname, which you can use to access the service

Copy DNS and try to acess application

We can't access the application directly; we need to access it using the hostname, which is example.com. We need to map the hostname to the IP address.

Register hostname in local machine ie. windows

Go to host path of windows system ie. your local machine

C:\Windows\System32\drivers\etc

copy ip address of your load balancer DNS and resgister domain as example.com in local

Access using DNS example,com

✅ Summary

1️⃣ Configured OIDC for AWS IAM authentication.
2️⃣ Created IAM Policy and Role for ALB Ingress Controller.
3️⃣ Installed ALB Controller using Helm.
4️⃣ Verified deployment and troubleshot IAM permission issues.
5️⃣ Deployed an ALB-backed Ingress to expose an application.

Custom Domain Configuration for the Project

Buy custom domain from Bigrock/Godadddy

i have purchased cloudmantra.xyz from Bigrock.in

Create Hosted Zone in Route53

Create public hosted zone

Create record

Update nameservers and ingress hostname

Add these Route53 nameservers into your domain

Now go to ingress.yaml file and update host naem to cloudmantra.xyz

Apply ingress.yaml configuration

kubectl apply -f ingress.yaml

Go to loadbalacer>HTTP:80 listener>Rules

Check using below command

curl --resolve cloudmantra.xyz:80:ip-address cloudmantra.xyz

Command Palette

What You Will Learn:

Project Overview

Project Architecture

AWS IAM User Creation

Step-by-Step IAM User Creation

Step 1: Log in to AWS Console

Step 2: Open IAM Users Section

Step 3: Enter User Details

Step 4: Assign Permissions

Step 5: Add Tags (Optional)

Step 6: Review and Create User

Step 7: Download Credentials

AWS EC2 Instance Setup

Step 1: Log in to AWS Console

Step 2: Launch a New EC2 Instance

Step 3: Choose an Amazon Machine Image (AMI)

Step 4: Choose an Instance Type

Step 5: Configure Instance Details

Step 6: Add Storage

Step 7: Configure Security Group

Step 8: Generate or Select a Key Pair

Step 9: Connect to Your EC2 Instance

For Linux Instances:

Common Challenges & How to Overcome Them

1. Key Pair Issues

2. SSH Connection Problems

3. EC2 Instance Not Accessible via Internet

4. High Costs Due to Running Instances

Docker Installation on Ubuntu EC2

Add Docker's official GPG key:

Add the repository to Apt sources:

Install Docker

Add ubuntu user to docker group

Verify Docker Installation

Kubectl Installation on Ubuntu EC2

Download kubectl

Install Kubectl

Verify Kubectl

Install Terraform on Ubuntu EC2

Add Hashicorp repos

Install Terraform

Verify Terraform Installation

Run the project Locally

Run the project as docker compose

Expose project on EC2 instance

Step 1: Log in to AWS Console

Step 2: Select the Security Group

Step 3: Add a New Inbound Rule

Step 4: Verify the Changes

Access project on below url

Docker compose Overview in simple words

1. Services

What Goes Into Services?

2. Networks

What Goes Into Networks?

3. Volumes

What Goes Into Volumes?

Summary

Docker vs Kubernetes

1. Containers Are Ephemeral

How Kubernetes Helps

2. Issues Related to Scaling and Healing

How Kubernetes Helps

3. Issues Related to Service Discovery

How Kubernetes Helps

Summary

Docker Compose vs Kubernetes

Docker Compose

Management of Multi-Container Applications

Kubernetes

Container Orchestration

Terraform Infra Provisioning

Terraform code

Useful plugins to write HCL files

Documentation

Terraform Lifecycle

1. Initialization (terraform init)

2. Planning (terraform plan)

3. Application (terraform apply)

1. Initialization (`terraform init`)

2. Planning (`terraform plan`)

3. Application (`terraform apply`)