Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs

HashiCorp: Deploying Generative AI Models on Amazon EKS with Terraform

HashiCorp
03/26/2026
1
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


TL;DR

  • HashiCorp demonstrates deploying generative AI models on Amazon EKS using Terraform and the JARK stack (JupyterHub, Argo, Ray, Kubernetes) to address scaling, performance, and cost challenges of GPU-based AI workloads
  • The architecture uses EKS managed node groups with GPU-optimized AMIs, Karpenter or Cluster Autoscaler for dynamic scaling, and specialized networking like Elastic Fabric Adapter for high-performance distributed training
  • The hands-on demo fine-tunes the Stable Diffusion text-to-image model using DreamBooth on EKS, with data scientists working in JupyterHub notebooks and models versioned in Hugging Face
  • Inference is served through Ray Serve running on the EKS cluster, pulling fine-tuned models from Hugging Face and scaling dynamically based on demand
  • The entire infrastructure—VPC, EKS cluster, node groups, and Kubernetes operators—is provisioned as code using Terraform modules from the open-source Data on EKS project

Foundation Models and the JARK Stack on EKS

This technical demonstration explores deploying generative AI models on Amazon Elastic Kubernetes Service (EKS) using HashiCorp Terraform and the JARK stack (JupyterHub, Argo Workflows, Ray, and Kubernetes). The session addresses the fundamental challenges of running generative AI workloads at scale, including infrastructure auto-scaling, distributed training across GPU nodes, cost optimization, and performance management. The presenter positions EKS as the optimal platform for the complete generative AI lifecycle due to its managed Kubernetes capabilities, support for GPU-optimized AMIs, integration with AWS deep learning containers, and compatibility with specialized networking like Elastic Fabric Adapter for high-performance inter-node communication. The architecture leverages foundation models that can be adapted for multiple tasks using minimal data and compute compared to training from scratch, representing a significant efficiency gain over traditional machine learning approaches that require separate models for each specific task.

Hands-On Implementation: Fine-Tuning Stable Diffusion

The practical demonstration walks through fine-tuning the Stable Diffusion text-to-image model using the DreamBooth technique on EKS infrastructure provisioned entirely through Terraform. The architecture separates workloads across two managed node groups: a core node group hosting infrastructure services like the AWS Load Balancer Controller and CSI drivers, and a GPU node group running the actual training and inference workloads. Data scientists access JupyterHub notebooks running on GPU nodes to experiment with model fine-tuning, with the resulting models pushed to Hugging Face for versioning. The implementation uses Hugging Face's Accelerate library to optimize distributed training and the Diffusers library for working with diffusion models. For inference, the solution deploys a Ray cluster using Ray Serve custom resource definitions to pull the fine-tuned model from Hugging Face and serve predictions at scale. The entire workflow—from infrastructure provisioning to model deployment—is managed as code through Terraform, demonstrating infrastructure-as-code principles applied to AI/ML workloads.

Deployment Architecture and Resource Management

The reference architecture implements a production-ready generative AI platform on EKS with careful attention to compute, storage, and networking requirements. Compute scaling is handled through either Karpenter (a flexible, high-performance cluster autoscaler) or the Kubernetes Cluster Autoscaler, both supporting sub-minute provisioning of GPU instances in response to workload demands. Storage leverages the EFS CSI driver for shared file systems and supports FSx for Lustre for high-performance workloads, with customizable NVMe instance store volume provisioning through EKS managed node group preboot commands. Networking optimizations include EC2 placement groups for low-latency inter-node communication and the AWS Neuron K8s device plugin for managing Inferentia and Trainium accelerator nodes. The demonstration uses the Data on EKS open-source project, which provides Terraform modules for deploying the complete stack including VPC, EKS cluster, managed node groups, and all necessary Kubernetes operators and controllers. The session concludes with a successful fine-tuning run that generates custom images from text prompts, validating the end-to-end workflow.

Chapters

0:00 - Introduction and Generative AI Overview
1:42 - Foundation Models vs Traditional ML
3:04 - Challenges of Running Gen AI on Kubernetes
4:47 - Why Amazon EKS for Generative AI
9:30 - The JARK Stack Architecture
11:30 - Solution Architecture Deep Dive
15:18 - Stable Diffusion Model and Hugging Face
17:29 - Prerequisites and Setup
18:20 - Deploying Infrastructure with Terraform
23:36 - Running the JupyterHub Notebook
25:02 - Results and Cleanup

Key Quotes

1:42 "Foundation models can also be customized to perform domain-specific functions that are differentiating to their business, using only a small fraction of data and compute required to train a model from scratch."
5:29 "Karpenter is a flexible, high-performance cluster autoscaler that helps improve application availability and cluster efficiency. Karpenter launches right-size compute resources, for example, Amazon EC2 instances, in response to changing application load in under a minute."
9:30 "One emerging stack on Kubernetes is JupyterHub, Argo workflows, Ray, and Kubernetes, also known as the JARK stack. You can run this entire stack on Amazon EKS."
11:52 "Ray is used to distribute the training of generative models across multiple nodes, which accelerates the training process and allows for handling of larger data sets."
25:05 "We did everything end to end on EKS using Terraform and we can clearly see the power of Terraform."
Categories:
  • » Cybersecurity » Application Security
  • » Data Management » DevOps
  • » Cybersecurity » Cloud Security
  • » Data Protection
Channels:
News:
Events:
Tags:
  • AI & Machine Learning
  • Cloud Security
  • DevSecOps
  • Technical Deep Dive
  • Demo
  • Generative AI deployment
  • Amazon EKS
  • HashiCorp Terraform
  • JARK stack
  • Stable Diffusion
  • GPU autoscaling
  • Foundation models
  • Infrastructure as code
  • Kubernetes for ML
  • Distributed training
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: HashiCorp: Deploying Generative AI Models on Amazon EKS with Terraform

              Upcoming Webinar Calendar

              • 04/08/2026
                01:00 PM
                04/08/2026
                Managing Configuration at Scale Across Group Policy and Intune
                https://www.truthinit.com/index.php/channel/1865/managing-configuration-at-scale-across-group-policy-and-intune/
              • 04/15/2026
                01:00 PM
                04/15/2026
                Service Account Security in the Age of AI: From Legacy Accounts to Agentic Identities
                https://www.truthinit.com/index.php/channel/1866/service-account-security-in-the-age-of-ai-from-legacy-accounts-to-agentic-identities/
              • 04/30/2026
                10:00 AM
                04/30/2026
                Insights from the 2026 Keepit Annual Data Report on SaaS Data Protection
                https://www.truthinit.com/index.php/channel/1868/insights-from-the-2026-keepit-annual-data-report-on-saas-data-protection/

              Upcoming Events

              • Apr
                08

                Managing Configuration at Scale Across Group Policy and Intune

                04/08/202601:00 PM ET
                • Apr
                  15

                  Service Account Security in the Age of AI: From Legacy Accounts to Agentic Identities

                  04/15/202601:00 PM ET
                  • Apr
                    30

                    Insights from the 2026 Keepit Annual Data Report on SaaS Data Protection

                    04/30/202610:00 AM ET
                    More events
                    Truth in IT
                    • Sponsor
                    • About Us
                    • Terms of Service
                    • Privacy Policy
                    • Contact Us
                    • Preference Management
                    Desktop version
                    Standard version