Foundation Models and the JARK Stack on EKS
This technical demonstration explores deploying generative AI models on Amazon Elastic Kubernetes Service (EKS) using HashiCorp Terraform and the JARK stack (JupyterHub, Argo Workflows, Ray, and Kubernetes). The session addresses the fundamental challenges of running generative AI workloads at scale, including infrastructure auto-scaling, distributed training across GPU nodes, cost optimization, and performance management. The presenter positions EKS as the optimal platform for the complete generative AI lifecycle due to its managed Kubernetes capabilities, support for GPU-optimized AMIs, integration with AWS deep learning containers, and compatibility with specialized networking like Elastic Fabric Adapter for high-performance inter-node communication. The architecture leverages foundation models that can be adapted for multiple tasks using minimal data and compute compared to training from scratch, representing a significant efficiency gain over traditional machine learning approaches that require separate models for each specific task.
Hands-On Implementation: Fine-Tuning Stable Diffusion
The practical demonstration walks through fine-tuning the Stable Diffusion text-to-image model using the DreamBooth technique on EKS infrastructure provisioned entirely through Terraform. The architecture separates workloads across two managed node groups: a core node group hosting infrastructure services like the AWS Load Balancer Controller and CSI drivers, and a GPU node group running the actual training and inference workloads. Data scientists access JupyterHub notebooks running on GPU nodes to experiment with model fine-tuning, with the resulting models pushed to Hugging Face for versioning. The implementation uses Hugging Face's Accelerate library to optimize distributed training and the Diffusers library for working with diffusion models. For inference, the solution deploys a Ray cluster using Ray Serve custom resource definitions to pull the fine-tuned model from Hugging Face and serve predictions at scale. The entire workflow—from infrastructure provisioning to model deployment—is managed as code through Terraform, demonstrating infrastructure-as-code principles applied to AI/ML workloads.
Deployment Architecture and Resource Management
The reference architecture implements a production-ready generative AI platform on EKS with careful attention to compute, storage, and networking requirements. Compute scaling is handled through either Karpenter (a flexible, high-performance cluster autoscaler) or the Kubernetes Cluster Autoscaler, both supporting sub-minute provisioning of GPU instances in response to workload demands. Storage leverages the EFS CSI driver for shared file systems and supports FSx for Lustre for high-performance workloads, with customizable NVMe instance store volume provisioning through EKS managed node group preboot commands. Networking optimizations include EC2 placement groups for low-latency inter-node communication and the AWS Neuron K8s device plugin for managing Inferentia and Trainium accelerator nodes. The demonstration uses the Data on EKS open-source project, which provides Terraform modules for deploying the complete stack including VPC, EKS cluster, managed node groups, and all necessary Kubernetes operators and controllers. The session concludes with a successful fine-tuning run that generates custom images from text prompts, validating the end-to-end workflow.