Amazon EKS Cluster Management

A fully automated Kubernetes cluster with observability, persistent storage, and RBAC on AWS

EKS Architecture Diagram

Project Overview

Designed and implemented a comprehensive Amazon Elastic Kubernetes Service (EKS) cluster with full observability, persistent storage, and role-based access control. The infrastructure was provisioned using Terraform, following Infrastructure as Code (IaC) principles. The solution implemented GitOps practices using ArgoCD for continuous deployment and setting up monitoring with Prometheus and Grafana, implementing persistent storage with EBS CSI driver, and configuring fine-grained access control with RBAC.

Key Components

Infrastructure Provisioning with Terraform

The EKS cluster infrastructure was defined as code using Terraform modules. This approach:

  • Automates the creation of the EKS cluster with specified Kubernetes version (1.30)
  • Configures worker nodes using t3.medium instances in private subnets
  • Integrates with existing VPC infrastructure
  • Enables reproducible infrastructure deployments
  • Scales worker nodes automatically up to 3 instances based on demand

Cluster Configuration

After provisioning, the EKS cluster was configured for local access by:

  • Setting up kubectl context to interact with the cluster
  • Establishing secure communication between local environment and EKS API
  • Enabling immediate cluster management capabilities

Tooling Installation

Essential Kubernetes tools were installed to streamline operations:

  • eksctl: CLI tool for creating and managing EKS clusters
  • Helm: Package manager for Kubernetes applications
  • These tools provide simplified cluster management and application deployment capabilities

Persistent Storage

The AWS EBS CSI Driver was implemented to:

  • Enable stateful applications with persistent volumes
  • Automatically provision EBS volumes for Kubernetes pods
  • Handle dynamic volume provisioning and lifecycle management
  • Support various storage classes with different performance characteristics

Monitoring Stack

A comprehensive observability solution was deployed:

  • Prometheus: Collected and stored cluster and application metrics
  • Grafana: Provided visualization dashboards for metrics analysis
  • Enabled real-time monitoring of cluster health and performance
  • Supported capacity planning and troubleshooting

Technical Challenges & Solutions

Terraform State Management

Remote state management was implemented to:

  • Store Terraform state securely in S3 with encryption
  • Enable team collaboration through state locking via DynamoDB
  • Prevent state corruption during concurrent operations
  • Maintain history of infrastructure changes

IRSA for EBS CSI Driver

The transition to pod identity associations addressed:

  • Deprecation of IAM Roles for Service Accounts (IRSA) for EBS CSI driver
  • Secure credential management for storage operations
  • Fine-grained IAM permissions for storage operations
  • Automated migration from legacy authentication methods

RBAC Implementation

Role-based access control was configured to:

  • Map IAM users to Kubernetes RBAC roles
  • Provide namespace isolation between development and production
  • Implement principle of least privilege for cluster access
  • Enable self-service access within defined boundaries

Monitoring Configuration

Application monitoring was enhanced by:

  • Configuring Prometheus to automatically discover and scrape metrics endpoints
  • Enabling Spring Boot Actuator metrics collection
  • Providing application-level visibility alongside infrastructure metrics
  • Supporting custom metric collection for business-specific monitoring

Architecture Diagram

Detailed EKS Architecture
EKS Cluster Architecture Overview

Key Achievements

  • Provisioned EKS infrastructure using Terraform with reusable modules
  • Successfully implemented persistent storage with EBS CSI driver using pod identity associations
  • Established comprehensive monitoring with Prometheus and Grafana with Email alerting
  • Implemented fine-grained RBAC controls for multi-team access
  • Configured alerting for critical cluster metrics
  • Developed health probes for application reliability
  • Implemented proper resource limits with LimitRange
  • Implemented GitOps workflow using ArgoCD for declarative continuous deployment

Work Samples

EKS

Work Samples

9 images