Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs

GenAI for Infrastructure: Capabilities & Limitations

HashiCorp
04/12/2026
0
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


TL;DR

  • LLMs generate infrastructure code probabilistically by predicting the next most likely token, but Terraform/HCL is severely underrepresented in training data (32x less than Python on GitHub), leading to missing best practices and hard-coded values instead of proper resource dependencies.
  • AI-generated infrastructure code poses security risks because models can propagate vulnerabilities from training data, including open ports and potentially malicious providers if adversaries poison public repositories—making deterministic security scanning tools essential.
  • Synthesis AI (analyzing existing data to find patterns) achieves higher accuracy than generative AI (creating new content) for DevOps tasks like log analysis and root cause correlation, because the solution space is constrained to what's already in the input.
  • The future of AI for infrastructure depends on Graph RAG technology that encodes infrastructure as interconnected resource graphs rather than flat documents, enabling context-aware code generation that understands dependencies, security policies, and environment-specific configurations.
  • Current AI code assistants lack the live production context needed for enterprise-grade infrastructure generation, but combining LLMs with graph-based infrastructure knowledge could overcome limitations by providing the full environmental context models need to generate correct, secure configurations.

How Large Language Models Generate Infrastructure Code

Roxane Fischer, CEO of AnyShift.io and former AI researcher, explains the fundamental mechanics of how LLMs work and their application to infrastructure as code. Neural networks learn patterns through training on massive datasets, encoding information into mathematical representations that enable them to predict the next most likely token in a sequence. When applied to Terraform code generation, these models use probabilistic prediction to suggest configurations based on patterns learned from public repositories. However, the presentation reveals a critical limitation: infrastructure code is severely underrepresented in training data, with only 2 million HCL files on GitHub compared to over 32 million Python files—a 32x difference. This data scarcity means models often miss best practices, generate hard-coded values instead of proper resource dependencies, and lack the live context of production infrastructure that would enable them to generate enterprise-grade configurations.

Security Risks and the Probabilistic Nature Problem

The presentation highlights serious security concerns with AI-generated infrastructure code. Because LLMs are probabilistic rather than deterministic, they can propagate vulnerabilities found in their training data, such as overly permissive security group rules with open ports (0.0.0.0/0). More concerning is the potential for adversarial attacks: if malicious actors publish modules with security flaws or malicious providers to GitHub, subsequent model retraining could incorporate these patterns, leading AI assistants to recommend compromised configurations. Fischer emphasizes that the probabilistic nature of neural networks means they never generate code with 100% certainty, making deterministic security scanning tools like Checkov and Snyk essential safeguards. The risk extends beyond simple misconfigurations to potential credential theft through malicious provider imports that models might suggest based on poisoned training data.

Synthesis AI vs. Generative AI: Different Tools for Different Jobs

Fischer draws a crucial distinction between two AI paradigms in DevOps. Generative AI takes minimal input and creates new content—an open-ended process with a large solution space that's prone to hallucination and inaccuracy. Synthesis AI, by contrast, takes large amounts of existing data and extracts insights from it, offering higher accuracy because the solution is contained within the input. For infrastructure operations, synthesis AI excels at log analysis and root cause analysis, finding patterns across millions of log entries or correlating customer alerts with system logs across heterogeneous data sources. This approach is already proving valuable in tools like Google Cloud Ops AI, which can identify the needle in the haystack by recognizing patterns that human operators might miss. Understanding when to use each approach is critical for effective AI adoption in infrastructure management.

The Future: Context-Aware AI Through Graph RAG

The presentation concludes with Fischer's vision for overcoming current limitations through context-aware AI systems. The solution lies in Retrieval Augmented Generation (RAG) technology, specifically Graph RAG, which treats infrastructure as an interconnected graph of resources rather than flat documents. Traditional RAG encodes company knowledge into searchable vector representations, but infrastructure requires understanding relationships between VPCs, subnets, IAM roles, and other resources. Graph RAG constructs a knowledge graph where nodes represent resources and edges represent relationships, enabling AI to query based on actual infrastructure topology rather than simple text similarity. The core challenges are constructing meaningful relationship definitions (how a VPC connects to subnets differs from tag-based connections) and efficiently traversing this graph at query time. When combined with LLMs, this context-aware approach could finally enable AI to generate infrastructure code with proper dependencies, security configurations, and enterprise-grade practices tailored to specific environments.

Chapters

0:00 - Introduction
1:39 - How AI Models Work
4:08 - Training and Encoding Information
6:53 - Code Generation with LLMs
10:01 - Limitations of LLMs for IaC
11:04 - Data Scarcity Problem
14:04 - Missing Best Practices
16:14 - Security Risks
19:14 - Generative vs Synthesis AI
22:22 - Context-Based Infrastructure
23:49 - RAG Technology
25:51 - Graph RAG for Infrastructure
29:07 - Conclusion

Key Quotes

12:33 "The issue is that with infrastructure as code, so Terraform in particular, the datasets are quite sparse. Why? Because all those amazing generative models that have been trained for the code generation parts have been mostly trained on GitHub. The issue is you don't put your infrastructure on clear on GitHub. It's very sensitive information."
13:28 "You can see that you have like more than dozens of millions of Python files on GitHub. You only have two million on HCL files, which is even less for Terraform, so it's like a 32-factor between Python to HCL, not even Terraform."
17:05 "Imagine you have 1,500 public modules on GitHub. You have some attacker that is going to actually create 200 new ones. Your next generation of models are going to be retrained on GitHub and are going to be actually trained on those new modules with those bad configurations."
17:36 "Because of this issue and the probabilistic nature of neural network and those LLMs, they will predict next code tokens based on probability, but never with 100% certainty. It's highly recommended to use deterministic tools, so, tools that will always give you the same output if you give the same input, such as like Chekhov or Snyk."
20:34 "Synthesis AI, on the contrary, is where you give a lot of information, but you don't want to create anything new. You want to find something within this information. You want to synthesise it."
20:47 "Because of that, synthesis AI tends to have better accuracy and results than generative AI for now, because the solution space is way smaller. You give a lot of information into the input, but the solution is contained within, and you just need to find it."

Categories:
  • » Cybersecurity » Application Security
  • » Data Management » DevOps
  • » Cybersecurity » Cloud Security
  • » Data Protection
Channels:
News:
Events:
Tags:
  • AI & Machine Learning
  • DevSecOps
  • Cloud Security
  • Technical Deep Dive
  • Best Practices
  • Large Language Models
  • Infrastructure as Code
  • Terraform
  • AI Code Generation
  • Security Vulnerabilities
  • Retrieval Augmented Generation
  • Graph RAG
  • Synthesis AI
  • DevOps Automation
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: GenAI for Infrastructure: Capabilities & Limitations

              Upcoming Webinar Calendar

              • 04/15/2026
                01:00 PM
                04/15/2026
                Service Account Security in the Age of AI: From Legacy Accounts to Agentic Identities
                https://www.truthinit.com/index.php/channel/1866/service-account-security-in-the-age-of-ai-from-legacy-accounts-to-agentic-identities/
              • 04/16/2026
                11:00 AM
                04/16/2026
                Beyond the Alert – Building the Human Centric Agentic SOC
                https://www.truthinit.com/index.php/channel/1372/beyond-the-alert-–-building-the-human-centric-agentic-soc/
              • 04/21/2026
                02:00 PM
                04/21/2026
                How Purpose Brands scales IT with Zendesk ITAM
                https://www.truthinit.com/index.php/channel/1881/how-purpose-brands-scales-it-with-zendesk-itam/
              • 04/30/2026
                10:00 AM
                04/30/2026
                Insights from the 2026 Keepit Annual Data Report on SaaS Data Protection
                https://www.truthinit.com/index.php/channel/1868/insights-from-the-2026-keepit-annual-data-report-on-saas-data-protection/
              • 04/30/2026
                01:00 PM
                04/30/2026
                The New Economics of VMware Exit
                https://www.truthinit.com/index.php/channel/1880/the-new-economics-of-vmware-exit/

              Upcoming Events

              • Apr
                15

                Service Account Security in the Age of AI: From Legacy Accounts to Agentic Identities

                04/15/202601:00 PM ET
                • Apr
                  16

                  Beyond the Alert – Building the Human Centric Agentic SOC

                  04/16/202611:00 AM ET
                  • Apr
                    21

                    How Purpose Brands scales IT with Zendesk ITAM

                    04/21/202602:00 PM ET
                    • Apr
                      30

                      Insights from the 2026 Keepit Annual Data Report on SaaS Data Protection

                      04/30/202610:00 AM ET
                      • Apr
                        30

                        The New Economics of VMware Exit

                        04/30/202601:00 PM ET
                        More events
                        Truth in IT
                        • Sponsor
                        • About Us
                        • Terms of Service
                        • Privacy Policy
                        • Contact Us
                        • Preference Management
                        Desktop version
                        Standard version