Amazon EKS Cluster Management

A fully automated Kubernetes cluster with observability, persistent storage, and RBAC on AWS

Role

DevOps Engineer

Project Name

Workloads deployment on Amazon EKS

Technologies

Terraform Amazon EKS Kubernetes ArgoCD Helm Prometheus Grafana AWS IAM

View on GitHub

Project Overview

Designed and implemented a comprehensive Amazon Elastic Kubernetes Service (EKS) cluster with full observability, persistent storage, and role-based access control. The infrastructure was provisioned using Terraform, following Infrastructure as Code (IaC) principles. The solution implemented GitOps practices using ArgoCD for continuous deployment and setting up monitoring with Prometheus and Grafana, implementing persistent storage with EBS CSI driver, and configuring fine-grained access control with RBAC.

Key Components

Infrastructure Provisioning with Terraform

The EKS cluster infrastructure was defined as code using Terraform modules. This approach:

Automates the creation of the EKS cluster with specified Kubernetes version (1.30)
Configures worker nodes using t3.medium instances in private subnets
Integrates with existing VPC infrastructure
Enables reproducible infrastructure deployments
Scales worker nodes automatically up to 3 instances based on demand

Cluster Configuration

After provisioning, the EKS cluster was configured for local access by:

Setting up kubectl context to interact with the cluster
Establishing secure communication between local environment and EKS API
Enabling immediate cluster management capabilities

Tooling Installation

Essential Kubernetes tools were installed to streamline operations:

eksctl: CLI tool for creating and managing EKS clusters
Helm: Package manager for Kubernetes applications
These tools provide simplified cluster management and application deployment capabilities

Persistent Storage

The AWS EBS CSI Driver was implemented to:

Enable stateful applications with persistent volumes
Automatically provision EBS volumes for Kubernetes pods
Handle dynamic volume provisioning and lifecycle management
Support various storage classes with different performance characteristics

Monitoring Stack

A comprehensive observability solution was deployed:

Prometheus: Collected and stored cluster and application metrics
Grafana: Provided visualization dashboards for metrics analysis
Enabled real-time monitoring of cluster health and performance
Supported capacity planning and troubleshooting

Technical Challenges & Solutions

Terraform State Management

Remote state management was implemented to:

Store Terraform state securely in S3 with encryption
Enable team collaboration through state locking via DynamoDB
Prevent state corruption during concurrent operations
Maintain history of infrastructure changes

IRSA for EBS CSI Driver

The transition to pod identity associations addressed:

Deprecation of IAM Roles for Service Accounts (IRSA) for EBS CSI driver
Secure credential management for storage operations
Fine-grained IAM permissions for storage operations
Automated migration from legacy authentication methods

RBAC Implementation

Role-based access control was configured to:

Map IAM users to Kubernetes RBAC roles
Provide namespace isolation between development and production
Implement principle of least privilege for cluster access
Enable self-service access within defined boundaries

Monitoring Configuration

Application monitoring was enhanced by:

Configuring Prometheus to automatically discover and scrape metrics endpoints
Enabling Spring Boot Actuator metrics collection
Providing application-level visibility alongside infrastructure metrics
Supporting custom metric collection for business-specific monitoring

Architecture Diagram

Detailed EKS Architecture — EKS Cluster Architecture Overview

Key Achievements

Provisioned EKS infrastructure using Terraform with reusable modules
Successfully implemented persistent storage with EBS CSI driver using pod identity associations
Established comprehensive monitoring with Prometheus and Grafana with Email alerting
Implemented fine-grained RBAC controls for multi-team access
Configured alerting for critical cluster metrics
Developed health probes for application reliability
Implemented proper resource limits with LimitRange
Implemented GitOps workflow using ArgoCD for declarative continuous deployment

Work Samples

9 images