Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs
  • DRAW

HashiCorp: Deploying Generative AI Models on Amazon EKS with Terraform

HashiCorp
03/26/2026
25
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


TL;DR

  • HashiCorp demonstrates deploying generative AI models on Amazon EKS using Terraform and the JARK stack (JupyterHub, Argo, Ray, Kubernetes) to address scaling, performance, and cost challenges of GPU-based AI workloads
  • The architecture uses EKS managed node groups with GPU-optimized AMIs, Karpenter or Cluster Autoscaler for dynamic scaling, and specialized networking like Elastic Fabric Adapter for high-performance distributed training
  • The hands-on demo fine-tunes the Stable Diffusion text-to-image model using DreamBooth on EKS, with data scientists working in JupyterHub notebooks and models versioned in Hugging Face
  • Inference is served through Ray Serve running on the EKS cluster, pulling fine-tuned models from Hugging Face and scaling dynamically based on demand
  • The entire infrastructure—VPC, EKS cluster, node groups, and Kubernetes operators—is provisioned as code using Terraform modules from the open-source Data on EKS project

Foundation Models and the JARK Stack on EKS

This technical demonstration explores deploying generative AI models on Amazon Elastic Kubernetes Service (EKS) using HashiCorp Terraform and the JARK stack (JupyterHub, Argo Workflows, Ray, and Kubernetes). The session addresses the fundamental challenges of running generative AI workloads at scale, including infrastructure auto-scaling, distributed training across GPU nodes, cost optimization, and performance management. The presenter positions EKS as the optimal platform for the complete generative AI lifecycle due to its managed Kubernetes capabilities, support for GPU-optimized AMIs, integration with AWS deep learning containers, and compatibility with specialized networking like Elastic Fabric Adapter for high-performance inter-node communication. The architecture leverages foundation models that can be adapted for multiple tasks using minimal data and compute compared to training from scratch, representing a significant efficiency gain over traditional machine learning approaches that require separate models for each specific task.

Hands-On Implementation: Fine-Tuning Stable Diffusion

The practical demonstration walks through fine-tuning the Stable Diffusion text-to-image model using the DreamBooth technique on EKS infrastructure provisioned entirely through Terraform. The architecture separates workloads across two managed node groups: a core node group hosting infrastructure services like the AWS Load Balancer Controller and CSI drivers, and a GPU node group running the actual training and inference workloads. Data scientists access JupyterHub notebooks running on GPU nodes to experiment with model fine-tuning, with the resulting models pushed to Hugging Face for versioning. The implementation uses Hugging Face's Accelerate library to optimize distributed training and the Diffusers library for working with diffusion models. For inference, the solution deploys a Ray cluster using Ray Serve custom resource definitions to pull the fine-tuned model from Hugging Face and serve predictions at scale. The entire workflow—from infrastructure provisioning to model deployment—is managed as code through Terraform, demonstrating infrastructure-as-code principles applied to AI/ML workloads.

Deployment Architecture and Resource Management

The reference architecture implements a production-ready generative AI platform on EKS with careful attention to compute, storage, and networking requirements. Compute scaling is handled through either Karpenter (a flexible, high-performance cluster autoscaler) or the Kubernetes Cluster Autoscaler, both supporting sub-minute provisioning of GPU instances in response to workload demands. Storage leverages the EFS CSI driver for shared file systems and supports FSx for Lustre for high-performance workloads, with customizable NVMe instance store volume provisioning through EKS managed node group preboot commands. Networking optimizations include EC2 placement groups for low-latency inter-node communication and the AWS Neuron K8s device plugin for managing Inferentia and Trainium accelerator nodes. The demonstration uses the Data on EKS open-source project, which provides Terraform modules for deploying the complete stack including VPC, EKS cluster, managed node groups, and all necessary Kubernetes operators and controllers. The session concludes with a successful fine-tuning run that generates custom images from text prompts, validating the end-to-end workflow.

Chapters

0:00 - Introduction and Generative AI Overview
1:42 - Foundation Models vs Traditional ML
3:04 - Challenges of Running Gen AI on Kubernetes
4:47 - Why Amazon EKS for Generative AI
9:30 - The JARK Stack Architecture
11:30 - Solution Architecture Deep Dive
15:18 - Stable Diffusion Model and Hugging Face
17:29 - Prerequisites and Setup
18:20 - Deploying Infrastructure with Terraform
23:36 - Running the JupyterHub Notebook
25:02 - Results and Cleanup

Key Quotes

1:42 "Foundation models can also be customized to perform domain-specific functions that are differentiating to their business, using only a small fraction of data and compute required to train a model from scratch."
5:29 "Karpenter is a flexible, high-performance cluster autoscaler that helps improve application availability and cluster efficiency. Karpenter launches right-size compute resources, for example, Amazon EC2 instances, in response to changing application load in under a minute."
9:30 "One emerging stack on Kubernetes is JupyterHub, Argo workflows, Ray, and Kubernetes, also known as the JARK stack. You can run this entire stack on Amazon EKS."
11:52 "Ray is used to distribute the training of generative models across multiple nodes, which accelerates the training process and allows for handling of larger data sets."
25:05 "We did everything end to end on EKS using Terraform and we can clearly see the power of Terraform."
Categories:
  • » Cybersecurity » Application Security
  • » Data Management » DevOps
  • » Cybersecurity » Cloud Security
  • » Data Protection
Channels:
News:
Events:
Tags:
  • AI & Machine Learning
  • Cloud Security
  • DevSecOps
  • Technical Deep Dive
  • Demo
  • Generative AI deployment
  • Amazon EKS
  • HashiCorp Terraform
  • JARK stack
  • Stable Diffusion
  • GPU autoscaling
  • Foundation models
  • Infrastructure as code
  • Kubernetes for ML
  • Distributed training
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: HashiCorp: Deploying Generative AI Models on Amazon EKS with Terraform

              Upcoming Webinar Calendar

              • 05/12/2026
                01:00 PM
                05/12/2026
                Transforming Black Box to Glass Box: Revealing Hidden Threats and AI Risks through Data Lineage
                https://www.truthinit.com/index.php/channel/1895/transforming-black-box-to-glass-box-revealing-hidden-threats-and-ai-risks-through-data-lineage/
              • 05/12/2026
                11:30 PM
                05/12/2026
                Implementing Effective Strategies for Active Directory Security and Data Protection
                https://www.truthinit.com/index.php/channel/1888/implementing-effective-strategies-for-active-directory-security-and-data-protection/
              • 05/13/2026
                01:00 AM
                05/13/2026
                Transforming the Black Box: Revealing AI Risks and Hidden Threats through Data Lineage
                https://www.truthinit.com/index.php/channel/1890/transforming-the-black-box-revealing-ai-risks-and-hidden-threats-through-data-lineage/
              • 05/13/2026
                05:00 AM
                05/13/2026
                Transforming Black Box to Glass Box: Revealing AI Risks and Hidden Threats through Data Lineage
                https://www.truthinit.com/index.php/channel/1894/transforming-black-box-to-glass-box-revealing-ai-risks-and-hidden-threats-through-data-lineage/
              • 05/19/2026
                01:00 PM
                05/19/2026
                Establishing a Robust AI Governance Framework for GenAI Throughout Its Lifecycle
                https://www.truthinit.com/index.php/channel/1936/establishing-a-robust-ai-governance-framework-for-genai-throughout-its-lifecycle/
              • 05/20/2026
                10:00 PM
                05/20/2026
                APAC: Establishing an AI Governance Framework for GenAI Throughout the Deployment Process
                https://www.truthinit.com/index.php/channel/1953/establishing-an-ai-governance-framework-for-genai-throughout-the-deployment-process/
              • 05/21/2026
                11:00 AM
                05/21/2026
                The Autonomous Era: Orchestrating a Resilient Enterprise
                https://www.truthinit.com/index.php/channel/1372/the-autonomous-era-orchestrating-a-resilient-enterprise/
              • 05/27/2026
                04:00 AM
                05/27/2026
                Rivoluziona i rischi dell'AI in opportunità con Netskope AI Security
                https://www.truthinit.com/index.php/channel/1925/rivoluziona-i-rischi-dellai-in-opportunità-con-netskope-ai-security/
              • 05/28/2026
                10:00 AM
                05/28/2026
                Harnessing AI: Transforming Perception into Purposeful Mastery
                https://www.truthinit.com/index.php/channel/1924/harnessing-ai-transforming-perception-into-purposeful-mastery/
              • 05/28/2026
                01:00 PM
                05/28/2026
                AI in the Fast Lane: Effectively Managing AI Security for Small Teams
                https://www.truthinit.com/index.php/channel/1951/ai-in-the-fast-lane-effectively-managing-ai-security-for-small-teams/
              • 06/02/2026
                01:00 PM
                06/02/2026
                Satori Spring: Insights from Recent Research on the 2026 Threat Landscape
                https://www.truthinit.com/index.php/channel/1930/satori-spring-insights-from-recent-research-on-the-2026-threat-landscape/
              • 06/04/2026
                02:00 AM
                06/04/2026
                Mastering the Unseen: Managing Shadow AI and Agentic MCP Traffic
                https://www.truthinit.com/index.php/channel/1948/mastering-the-unseen-managing-shadow-ai-and-agentic-mcp-traffic/
              • 06/16/2026
                07:00 AM
                06/16/2026
                Transforming Data Risk into Actionable Priorities: What to Address First
                https://www.truthinit.com/index.php/channel/1952/transforming-data-risk-into-actionable-priorities-what-to-address-first/

              Upcoming Events

              • May
                12

                Transforming Black Box to Glass Box: Revealing Hidden Threats and AI Risks through Data Lineage

                05/12/202601:00 PM ET
                • May
                  12

                  Implementing Effective Strategies for Active Directory Security and Data Protection

                  05/12/202611:30 PM ET
                  • May
                    13

                    Transforming the Black Box: Revealing AI Risks and Hidden Threats through Data Lineage

                    05/13/202601:00 AM ET
                    • May
                      13

                      Transforming Black Box to Glass Box: Revealing AI Risks and Hidden Threats through Data Lineage

                      05/13/202605:00 AM ET
                      • May
                        19

                        Establishing a Robust AI Governance Framework for GenAI Throughout Its Lifecycle

                        05/19/202601:00 PM ET
                        More events
                        Truth in IT
                        • Sponsor
                        • About Us
                        • Terms of Service
                        • Privacy Policy
                        • Contact Us
                        • Preference Management
                        Desktop version
                        Standard version