How did VGW maintain team autonomy while centralizing Terraform management?

VGW used Terraform Cloud's projects feature to create isolated tenants within a single org, giving each team their own project that functions like a separate org. Teams can manage their own workspaces through a self-service management workspace pattern, but operate within consistent security guardrails and governance standards established by the platform team.

What specific automation did VGW build to streamline Terraform onboarding?

The team created GitHub-based automation that provisions a complete tenant setup through a pull request: a team project, a management workspace for self-service, interactive login credentials, and three scoped service accounts (plan, apply non-prod, apply prod) stored in AWS Secrets Manager. They also built custom GitHub Actions for authentication, static analysis, and API-driven runs.

How does VGW's service account scoping improve security?

VGW created three service accounts with specific purposes: plan-only accounts can only create speculative runs on pull requests and read state outputs; apply non-prod accounts can plan and apply only to dev/test workspaces; and apply prod accounts work only with production workspaces. This scoping ensures appropriate guardrails are in place and prevents non-prod credentials from accessing production state.

Scaling Terraform from Org Sprawl to Centralized Structure

Name: Scaling Terraform from Org Sprawl to Centralized Structure
Uploaded: 2026-04-09T16:41:23-04:00
Duration: 21 min 35 s
Description: TL;DR VGW reduced Terraform onboarding time from 8-12 weeks to under 12 minutes by consolidating from multiple team-specific orgs into a single centralized organization with project-based isolation. The company leveraged Terraform Cloud's projects fea...

HashiCorp

04/09/2026

0 (0%)

Report Like Favorite

Transcript

but it's how VGW has evolved Terraform over roughly about five to six years and how we scaled that. So, this is me, so Bruce Dominguez, so I'm the head of site reliability at VGW, so if you've not heard of VGW, so we are an Australian based company, so based in Perth, but we've got offices all around Europe and the US, but we're an interactive entertainment company where we harness technology and creativity to deliver world class free to play games that delight over one million North American players, not scripted at all. So I'm a previous Hashcorp ambassador, so for a couple of years, I'm also the hug leader, so Hashcorp user group leader in Perth and a fan boy for Hashcorp products for about six years, starting with Vagrant actually, started with Vagrant and then onto Terraform. I'm going to start with a short backstory, so it just gives you an idea of like our scope of Terraform. So as I mentioned before, we've been using Terraform for about five years to manage our cloud infrastructure, but it wasn't the only tool that was used, so we had CloudFormation, SDKs, we had Pockets of Pulumi or Pulumi and of course we had some ClickOps, we won't talk about that. We matured our usage of Terraform while using the Terraform binary, often with a Docker image with an S3 backend state, and then we migrated over to Terraform Cloud. But the way we scaled our usage was using, by just adding more and more Terraform orgs back in the day. Just by the way, the whole presentation is used by Canva, so I'm just inspired by Canva, not as flashy as I was, but we'll move on. So we started our journey in 2019, so we had a competitive POC, or proof of concept, of multiple IAC tools so that we could coalesce on which tool the organisation would land on. The main opponents were Terraform and Pulumi. Unfortunately Pulumi won the battle, but eventually Terraform would win the war, which is really good, why I'm standing here. At the end of the year we had one team adopt Terraform on the side, that was my team at the time, just to use an S3 backend as our state. Teams started to see the good work that we were doing and started adopting Terraform as well, so we started building that momentum. And in 2019, when Terraform Cloud was introduced, we had a single team become early adopters of that technology, which is my team as well, with great success. But as we started to do that, other teams began to migrate away from CloudFormation, look at the good work that we were doing, and migrate to Terraform Cloud, and off CloudFormation and off Pulumi eventually. But as we grew, we were getting more and more requests to have access or get access to Terraform Cloud. But we were running into problems pretty quickly. We, with every new team, as I mentioned before, we spawned a new org. Now this is because teams wanted to isolate their workloads, they wanted to isolate their runs, so they didn't want to have concurrency issues, but also they wanted to isolate their state, so they didn't want to share any sensitive outputs or outputs to other teams. And because of this, we had such an explosion of Terraform orgs, just every single team had their own org, it was a lot. And with this organic adoption, we began to slow down the rate that we could service those requests. So with every request, we would have to renegotiate the contract, so this is before resources on the management. We would have to renegotiate the contract, so we'd have to estimate the number of runs they would be using, the number of admin users they needed, and then we'd have to go through a procurement process, which generally will take around about 8 to 12 weeks, so quite a leeway. Oh dear. It was that good, right? Oh well, there we go. Don't press that button again, all right. And as we started to scale our usage of Terraform, so not only did we have to run the procurement gauntlet, so 8 to 12 weeks for each new team, we were getting more and more requests, which meant our backlog for servicing those requests was ever-growing. We also had a decentralized structure, which meant lack of standards, governance, and inconsistencies of approaching how we use Terraform across the organization. And with new orgs, with every new org that was created, we had an overhead to track a black screen, to track, sorry, with every new org that was created, we had to, the overhead to track the number of runs they were doing, how many admin access they were using, because all had cost implications every time we had to renegotiate, because admin users were expensive. So we needed to move faster, we needed to stop the sprawl of orgs, we needed a mechanism to allow teams to onboard easily and with less overhead, and guide the engineers into the pit of success. So there had to be a better way, so a better way to reduce the friction of onboarding Terraform without needing a purchase order, we needed to reduce the explosion of Terraform orgs, and to make the management of that much, much easier. We needed to continue to maintain a strong and consistent security posture within the organization and establish a way for teams to easily onboard with a streamlined pipeline for deployment, which allows much quicker adoption. So our first step was to eliminate that 8-12 week waiting period for teams to use Terraform. So we wanted to reduce the time to Terraform. You can tell I'm an SRE, I build acronyms, TTT, time to Terraform. That should stick, hopefully, after this. So our problem of our ever-growing orgs, or the number of orgs we had, was that we had so many org owners, so each team had an owner, so we would have to, as I said before, estimate for every new request or every new team, understand how many runs that we needed, their assumed usage. But with that, we also had to go back and speak to all of the existing owners and coordinate with them around, have they been using or rebalanced some of their usage for runs and admin users? So that was a headache. And because of each org on the team, so each org owned its own team, we lacked consistency. We lacked a lot of ways for how that were managed, which meant each team would, or let's see, sorry, some teams would have everyone as admin owners. So if you had access, you had admin, which is not ideal, but that led to cost growth because each admin user was an additional cost. And or you had some other groups or other teams with a single admin user, which led to operational issues when they were on leave or no one could get hold of them. And we had an explosion of private registries, which led to duplication of registries. So across the org, we saw single use modules that didn't scale past a single team or a single purpose. So we needed to keep it, keep, so we needed to make the complicated less complicated or keep it simple. So we began the creation of a single VGW org. With the single org, we could simplify the management and cost of running Terraform Cloud within VGW, as well as give engineers a far simpler path to onboard and kicking them into the pit of success. It would improve our security and governance of Terraform Cloud because instead of having multiple orgs, we would have a single org that we could concentrate on and also give us a much simpler RBAC model with the flexibility to grow. But it also gave us the opportunity to have a single registry or private registry that we could use as a dangling carrot for teams to use so that we could publish new modules to that registry and act as an incentive for the teams to jump on board. But we still wanted to make sure that the teams had autonomy. We wanted to make sure we wanted to give them enough space, but not be too prescriptive. So we gave the teams the ability to provision their own workspaces through a management workspace that they could control, but eventually just allowing them to scale and not allowing them to deviate too far. So we leveraged the projects functional feature from Terraform Cloud. That was introduced, I think, last year. This feature allowed us to create tenants within the single org in such a way that we could isolate or keep those isolated from other tenants. So that meant that projects were effectively their own org. And that meant teams could manage their own workspaces, but also ensure that no state was leaking outside of their project or org, unless it was explicitly permitted to do so. So we had the structure. Now we needed to reduce that friction to adopt and make teams' lives much easier. So we leaned heavily on automation and golden modules for bootstrapping of projects and teams for the new org through orchestration. We also vended dynamic OIDC authentication credentials for, well, basically to remove the need for long-lived credentials, so teams wouldn't have to worry about that. And we established a pattern for the right way to create workspaces, so giving them the tools for that success so that they can easily adopt and also be secure. So we did this through GitHub. So we have a central repository that manages our Terraform org. And with that, we can vend a Teams project for them to use, a Magic Workspace so that they can create their own workspaces. We don't really care about that. They can do what they want. And also an interactive login so that they can log into the UI and CLI, because they need access. And then create three service accounts that can be used for the CI pipelines to provision infrastructure. We also then take those tokens and store those securely in AWS Secrets Manager. I wish it was Vault, but it's not. Securely in Secrets Manager, so they can be programmatically retrieved when they need to. So what we did with those three service accounts that we created, we made sure that they had very specific scopes for their usage. So we have a plan service account that is only allowed to create speculative runs on pull requests. So only allowed, can't apply, just does plans. It can read state outputs from workspaces so that the teams can use them. We also have an apply for non-prod. So it allows you to do a plan and apply for only dev workspaces or dev test workspaces and nothing prod. So that's scoped to that. And then we also have an apply prod, which is basically the same, but only for production workspaces. So we've isolated that from there. So apply non-prod cannot read anything that's in a prod workspace. By scoping these service accounts, for a specific purpose and specific environments here, we ensure that the appropriate guidelines or guardrails are in place for the teams to use. So again, kick them in the pit of success. Next, we made improvements on the CICD pipeline. We simplified the process of retrieving the API tokens needed to communicate with Terraform Cloud. So we have a workflow that authenticates to AWS via OADC, grabs an SDS token that is scoped to a specific secret and is pulled from Secrets Manager. And that token, or sorry, those credentials are then used to then authenticate to Terraform Cloud. So making sure that we've got a very secure pipeline there. And by vending and storing these secrets or tokens centrally, we ensure the tokens are not scattered across as environment variables in CI pipelines, enhancing our security and maintainability. For those who still have CircleCI with the issues where we had to rotate the credentials, that was fun. At the beginning of the last year, I think it was. We also introduced several GitHub Actions to improve the quality of life for engineers. So like that workflow I showed earlier on, we wrapped that up into a GitHub Action. So teams don't even have to think about it anymore. They just use the GitHub Action and off they go. We also introduced static analysis tools or tests for pull requests. So we do an FMT, a validate, TFLint, a checkoff, just to ensure that we've got good code quality and consistency. We additionally wrapped up the API-driven runs using the HashiCorp TFC workflow GitHub Action. So we've wrapped that up. And then we introduced a workflow to write that back or the outcomes of a plan back to the PR. So you can see there. So you can see what's changing and click on View if you want to see what those changes are. But we also put a lot of effort into improving documentation and how-to guides. So we set up the teams for success so they know what they're doing. So we did step-by-step guides on how to provision a new tenant project. We also set up guides to set up a CRCD pipeline, so using the new GitHub Actions and workflows, and how they contribute to the central private registry that we have now, and also publishing to that as well. So making sure that we're trying to foster that culture of contributing. It's no good if we do all this work and no one knows about it, right? So we needed to ensure that we successfully land this change by over-communicating. So we did skill shares on why we were moving to a single org and what those benefits were. We did regular updates to teams on progress of the work we were doing, sneak peeks into the new modules and automation. We also, through relationships, we got trusted teams to migrate over to the new org, which became change champions and advocates for what we were doing. And we also did Fortnite showcases to the broader engineering community, so they had the awareness of what we were doing. There we go. So with all that work, we did achieve some great results across those areas around time to Terraform improved, TM, simplified structure, and made it easier as well. So we improved the time it takes to onboard new teams into Terraform now from 12 weeks, which is actually probably a little bit more, actually, if legal get involved. And to roughly, well, less than 12 minutes. So it's just a pull request now. So teams can just get their tenants straight away. By simplifying the org, we're in a position now to scale our usage of Terraform, so no matter how many teams we have, we can scale up that usage. We've got a much simpler management of, or gives us a much simpler management of our Terraform cloud, so across multiple orgs. So imagine having to manage 20 to 30 orgs. That is a nightmare. So now we've got one. Also allows us to have a much clearer picture on our costs and our spend. So instead of having to do reports on every single org, we now can just do it on one, which, let me tell you, is far simpler. We now made it easier for teams to adopt Terraform with the paved roads automation, as I mentioned earlier, which reduced the friction for engineers to start with Terraform. So that cognitive load is now gone. We've baked in security best practices so that they just get that for free. They don't have to think about it anymore. Am I doing the right thing? No, you are now, because you don't have to think about it. We published new modules only in that BGW org, so that we've got that dangling carrot. So all the shiny, brand new modules that have got all the new tech, if you want to use them, you have to migrate. So what's next? Well, I mean, we're far from finished, right? So we've been doing this journey for about eight months now, almost nine. But we're onboarding new teams every day, and we've still got a fair chunk of existing orgs to fold in. We still have some rough edges, though, that I want to fix to address and improve the onboarding experience for teams. So we want to integrate the automation that we have done into our internal developer platform. As we've seen with the maturity model we spoke earlier on, I want to move us to stage three of that maturity model of scaling, so all that self-service. And also potentially look at the new features that are coming down the pipeline, the Terraform stacks is something that I'd like to look into, and what that impact is on our current workspace structure, and how we can manage that and leverage that in the future. It's good that it's pausing, because I can't press. There we go. So I grabbed some stats end of May, I think it was, when I did the presentation. So we've got 45 projects that we've created so far, and it's growing. It would have grown by now, actually. With 345 workspaces, so still a small amount, but we're growing. We've onboarded 400 teams recently, and we've published 15 private modules, and one of those modules actually has been downloaded around 13,000 times already. It would have been way more now, after come June. And that's it. You survived. Thanks for listening to my story, guys. Thanks very much, Rick. Thank you.

TL;DR

VGW reduced Terraform onboarding time from 8-12 weeks to under 12 minutes by consolidating from multiple team-specific orgs into a single centralized organization with project-based isolation.
The company leveraged Terraform Cloud's projects feature to create isolated tenants, implemented OIDC authentication, and built GitHub-based automation for self-service provisioning with scoped service accounts.
Centralization eliminated procurement bottlenecks, standardized security practices, simplified cost management, and enabled a centralized private module registry that drove adoption across 45 projects and 400 teams.
The platform team used 'paved roads' automation, comprehensive documentation, and change champions to drive migration, publishing 15 modules with one achieving 13,000 downloads.
VGW continues evolving their Terraform platform by integrating with their internal developer platform and exploring Terraform Stacks to further optimize workspace management at scale.

The Challenge of Terraform Org Sprawl

VGW's Terraform adoption began in 2019 with organic growth that led to significant operational challenges. As teams migrated from CloudFormation and Pulumi to Terraform Cloud, the organization spawned a new Terraform org for each team to maintain isolation of workloads, runs, and state. This approach created an explosion of organizations that became increasingly difficult to manage. Each new team request required an 8-12 week procurement process to renegotiate contracts, estimate run usage, and provision admin users. The decentralized structure resulted in inconsistent standards, governance gaps, and duplication of private registry modules. Cost tracking became complex across multiple orgs, and the overhead of coordinating with numerous org owners to rebalance usage created significant friction for the platform team.

Consolidation Strategy and Implementation

To address these challenges, VGW's SRE team designed a centralized single-org architecture leveraging Terraform Cloud's projects feature to create isolated tenants within one organization. This structure allowed teams to maintain autonomy while establishing consistent security and governance patterns. The team built automation around GitHub workflows to provision new tenant projects, create scoped service accounts for CI/CD pipelines, and establish a management workspace pattern for teams to self-service their infrastructure. They implemented OIDC-based authentication to eliminate long-lived credentials and created three distinct service account types: plan-only for pull requests, apply for non-production environments, and apply for production workspaces. The solution included custom GitHub Actions, static analysis tooling, and comprehensive documentation to guide teams into the 'pit of success' with minimal friction.

Results and Ongoing Evolution

The consolidation effort reduced team onboarding time from 8-12 weeks to under 12 minutes through a simple pull request process. By May, VGW had created 45 projects with 345 workspaces, onboarded 400 teams, and published 15 private modules to their centralized registry—one module alone had been downloaded 13,000 times. The single-org model simplified cost tracking, eliminated the need for constant contract renegotiation, and provided a clear path for scaling Terraform usage across the organization. The team continues to refine the onboarding experience, plans to integrate automation into their internal developer platform, and is exploring Terraform Stacks to further optimize their workspace structure. This transformation demonstrates how thoughtful platform engineering can turn infrastructure-as-code sprawl into a scalable, secure, and developer-friendly foundation.

Chapters

0:00 - Introduction and VGW Background
1:15 - Terraform Journey Backstory
2:19 - Early Adoption and Org Sprawl
4:50 - Challenges of Decentralized Structure
7:03 - Vision for Centralization
8:54 - Single Org Architecture
11:12 - Automation and Golden Modules
12:42 - Service Account Scoping Strategy
13:48 - CI/CD Pipeline Improvements
16:25 - Change Management and Communication
17:21 - Results and Impact
19:23 - Future Roadmap and Stats

Key Quotes

4:20 "And with this organic adoption, we began to slow down the rate that we could service those requests. So with every request, we would have to renegotiate the contract, so this is before resources on the management. We would have to renegotiate the contract, so we'd have to estimate the number of runs they would be using, the number of admin users they needed, and then we'd have to go through a procurement process, which generally will take around about 8 to 12 weeks ..."
8:54 "So we needed to keep it, keep, so we needed to make the complicated less complicated or keep it simple. So we began the creation of a single VGW org. With the single org, we could simplify the management and cost of running Terraform Cloud within VGW, as well as give engineers a far simpler path to onboard and kicking them into the pit of success."
10:20 "So we leveraged the projects functional feature from Terraform Cloud. That was introduced, I think, last year. This feature allowed us to create tenants within the single org in such a way that we could isolate or keep those isolated from other tenants. So that meant that projects were effectively their own org."
12:42 "So what we did with those three service accounts that we created, we made sure that they had very specific scopes for their usage. So we have a plan service account that is only allowed to create speculative runs on pull requests. So only allowed, can't apply, just does plans."
17:29 "So we improved the time it takes to onboard new teams into Terraform now from 12 weeks, which is actually probably a little bit more, actually, if legal get involved. And to roughly, well, less than 12 minutes. So it's just a pull request now. So teams can just get their tenants straight away."
19:05 "We published new modules only in that BGW org, so that we've got that dangling carrot. So all the shiny, brand new modules that have got all the new tech, if you want to use them, you have to migrate."

Categories:

Tags:

Show more Show less

TL;DR

The Challenge of Terraform Org Sprawl

Consolidation Strategy and Implementation

Results and Ongoing Evolution

Chapters

Key Quotes

Ask Your Cloud Anything: Unlocking Governance Silos in your Environments

Discover DLP Memories: The ever-evolving triage agent enhancing efficiency each shift.

Safeguarding Sensitive Data in the Era of AI Adoption

Same Tactics, Enhanced Speed: AI Agents’ Impact on Identity Attacks