Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs
  • DRAW

Topgolf's Migration from VMware to VergeOS

VergeIO
03/20/2026
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


Today, I'm at the Topgolf Sim Lab, and I'm here to talk to some IT folks at Topgolf. They've recently selected Verge IO as their infrastructure software platform of choice. So I encourage you guys to ask a lot of questions. These are guys that are living it day in, day out. I don't know if we'll be able to get a lot of golfing tips, but mostly infrastructure tips, I think, is the safe thing. So let me introduce the team here. First is Scott Forehan. Scott, come on over. Hey, George. How you doing? Good to see you. And then also, Jason Sova. Jason, thanks for joining us. Good to see you. All right. Well, before we jump in, give them a little bit of background on yourself, what you do at Topgolf. Yeah. So Scott Forehan, I joined Topgolf in 2018. I started out as an infrastructure engineer, moved into architecture, and then into leadership. My background is mostly localization and storage. I couldn't be more privileged to be here and stuff. I'm Jason Sova. I'm the architect for enterprise infrastructure. I've been with the company since 2019, started as a senior infrastructure engineer, and I'm pretty much responsible for all the technology and the home office needs. Awesome. So, Scott, at a high level, like in one sentence, what functions or what does IT do for Topgolf? Everything. So, Topgolf is a technology company that masquerades in sports and entertainment, honestly. Everything that we do is technology-driven, like I said. And as I mentioned, we're in the Sim Lab here. What is that? I think you mentioned you guys were doing a hackathon or something here? Yeah. A month ago or so? A month ago. We were working on some new game designs, so we brought everybody together in the lab. So, these are like our demo lab. This is our sports center for our venue business. We have these two hitting days here where we can launch our games, we can test stuff out, give us a spot where we can really push the envelope without impacting everybody. Yeah, this is going to get me in a lot of trouble because now all the guys are going to want this for their demo labs. It's not ready for market yet. All right. So, obviously, Scott, we're here because you've had some struggles with VMware, your former infrastructure software. Take me through that journey, those initial days. Did it happen as the acquisition happened? Maybe it was before? What did that look like? It really did pick up when the acquisition happened. We were having some trouble with Broadcom's changing of the game every five minutes or so. Really, when we started to dig into it, the cost wasn't tenable. The fact that everything was changing every five minutes. We were in the midst of trying to sign a DLA agreement with Broadcom and with VMware at the time of the acquisition. Broadcom kind of changed that a couple of times and ended up trying to push us to go from Dell a couple of times. I mean, it felt like it never really was going to get off the ground. So, we knew we had to start looking at new solutions. The team started to put together. I think we looked at 14 different solutions. Pretty much anybody that was in the market, even rolling their own, something like TikTok, we didn't know about Gamma. So, we started to pour through a bunch of different solutions. Obviously, we were in for a while. It's been difficult to break. Well, that's good. Let's not do it today. I remember early on when we were talking, we were looking at stuff like a 400% price increase and things like that. So, I think that probably would be like half of your IT budget anyway, right? I mean, that kind of looks pretty cool. Yeah, and at that point in the year, we'd also already locked in our budgets for the year. So, it became a really hard story to tell for us. And we just kind of had to bite the bullet. And thankfully, Broadcom did come to the table and give us decent pricing. Not sure what we had before, but it's still exponentially more than what the business actually had to spend. We needed a better partner. Jason, what were some of the must-haves? You sort of grabbed this list of all 14 different vendors, right? What were some of the things like, I've got to make sure I can do the X, Y, and Z? You can say it in like the top three. A lot of our top things was around stability. So, we're a fairly lean shop. We have a lot of venues. Every one of our venues is its own little isolated place. So, for me, a big thing was stability. We have a limited number of people who support a very large footprint. So, we needed to be able to have everything up. We're coming from a DSL environment. So, hyper-converged was a big part of it. Having to be sand. We weren't looking to go and purchase a bunch of storage. We wanted to keep everything isolated at the same time. And then also the ability of programmatically managing. So, because we are such a small shop, we're trying to do everything possible to help. My general rule of thumb is if a finger has to touch a keyboard or a mouse, then that's a policy mistake. But we know what the robot is doing well. And we were able to accomplish that for the most part with Verge. So, right now, we have it to the point where once the base OS is installed, which we're also looking to automate down the road with something like Valorant. Everything after the base OS is installed right now, we have running through some sort of CIBC pipeline. From issuing certs to configuring the cluster, deploying workloads. And then also the maintenance down the road. When certs come up for renewal, it's simplified through code. That was a huge win for us. We have been trying to do that for years with our VMware environment. But the ability to know that if I set it up like this today, it's going to be like that a month from now. There was a lot of configuration drip and a lot of services. Oh, we've got to go restart the service to be able to make this work. But where we just need it to work all the time. Scott, one of the things you had said that I recall early on was, we needed each of these venues to basically be standalone. And everything got tight. So, like if people find deer, you don't want to make them mad. Exactly. It can't affect the experience. And players, what are the psychology you guys see? So, with VxRail VMware, we were running, I want to say we've got four, six, nine, 12, five. So, that was kind of our base. We haven't had a single bit of downtime. But that needs to be fulfilled. So, that's awesome. Yeah. That's been solved with no issues since. We have our one venue we deployed eight, nine months ago. That's kind of our demo. And it's been solid. And the only time it's ever gone down is because of environmental issues. We have power grid or something like that. It's never been doing the software or even hardware. One of the things that this has allowed us to do is move from that customized VxRail platform into consumer grade. So, that's really lead times have gone down. Hardware costs have gone down. And it's just simpler. Sure. So, we've got questions flying in. Keep them coming. This is great. So, beyond price, what would you guys say your number one reason you selected Earthwell? I'd say add that question later, but since you asked it, we're going to go with it. For me, it's all about privacy. With Verge and the team there, they're very original. For me, it personally feels like they're not out to sell a product. They're out to help us be successful. And that is huge for me. In fact, we're a very lean shop. We're a pretty small group. So, having that relationship and having that partnership that we can reach out and lean on when we need it. So, for us, we have the responsiveness. Absolutely. Oh, that's awesome. I'm up three thousand. Well, it was the same, but since I can't say that, I'm going to say it. I'm going to say simplicity. So, vSphere is incredibly complex. It has been a product that's been around for ages and they've kind of built a bunch of stuff on. Everything is built from the ground up in Verge to be as simple and humane as possible. And it just works so well at the end. It works at the data center test as well. So, I'm going to go with simplicity. Was that a concern? Because I've seen that there are some of our, I guess, competitors that really focus on the edge. And my concern there about was, you know, not being in charge of IT. I would want to run something at the edge and then have to run something, you know, enterprise class at the data center. Is that something you were kind of thinking about as well? Yeah, absolutely. Another big functionality that was really big for us with Verge was dependency in the sites. So, with our older environment, every venue had its own vCenter, it was its own cluster. So, you had, we were managing 100 independent vSphere. With Verge, we've been able to leverage dependency and the sites. So, now we've basically set it up where we have regional tenants, the data center, and then the sites, or the venues within that region fall back to that tenant. So, it can effectively give us a single pane of glass. Well, three panes of glass, one per region. Right. But now, if I want to see what's happening in my East Coast venues, I have one place. Whereas before, again, I got, to spoil the number, I would like to add 20 venues in the East Coast and just have to log into 20 different vCenters to see what's going on. So, that was huge for us. And we're also using that tenancy for our backup. So, we have the cloud snapshot at venue, and then we're using that tenant data center on the site to replicate that snapshot data back. So, we have an offset of backups. That's all in the native region from Verge now. That's been a big simplicity for us. So, and the simplicity topic ties into the next question. Again, just keep in mind, guys, this is really good. The, talk about the learning curve. We'll get the, just the functional migration here. The learning curve, because you guys have, I'm sure we're running VMware for over a decade, right? So, I don't like to be grumpy, but I have guys. 25, 30 years of VMware. So, and moving to Verge, there was some hiccups, just understanding the way the philosophy changed. But for the most part, it was pretty straightforward. I mean, there were a few things, but, you know, we had our CEO who was very familiar with it. So, I'd be like, I don't understand this. And he's like, oh, that's, just give me the switch. All right, you give me the VM work. And I'm like, oh, okay. All the pieces just fit together. But for the most part, very straightforward. Obviously, you have your terminology. It's really all about where to find your click. Yeah, the enterprise. That's a learning curve, even if it's in the enterprise. So, yeah. Next version of the product or whatever. Yeah. Support ops guides. We have, we have a couple of venues out in the field, and we have a lot of the POC currently working on wrapping up and getting their deployment set up. So, we haven't really gone through and done the official training with them. They've just been kind of hooking around in our lab environments while we get everything ready to go and then we'll send out the official training. But just by giving them like a lab to go and go through, they've figured it out on their own. Just by putting in. Are you using the tendency to kind of create those labs as well? Yeah, so we're using, we're doing nested deployments inside one of our plus three labs so they can spin them up and tear them down and break them and they don't impact the underlying hardware. Yeah, I always love it. One of the kind of thing you probably don't, or get a little guilty of not talking about sort of the training aspect. You can literally give a guy his own data center and he could destroy it, but it won't hurt. So, that's cool. What were some of the things, going back to kind of that selection process, what were some of the reasons you knocked out, you don't necessarily have savings, but knocked out some of the things off the list? Maybe you didn't even make it through testing. I mean, it rains, right? We looked at 14 different products. Some of them weren't even to market yet. So, there was one that was, I think it was called canonical cloud or something like that, that was effectively a white paper. That's still right. Yeah, it was all conceptual, right? They were working to put it together, but it wasn't quite there yet. And it was, you know, Azure Stack and AWS Outpost and all kinds of stuff that we were trying to see what fit. So, we looked at a ton of different products, some of which was too big for our environment, right? We looked at OpenStack. Yeah. Very simple OpenStack environment. Yeah. I mean, it's Amazon in a box and we don't need, but we need a very small subset of that that's targeted towards our extra deployments and targeted towards the things that we need to do in beta. So, we didn't want to make it impossibly big, right? So, you know, a bunch of those different products kind of fell off pretty early. Some of them, the early process, some of them were management right there. Some of them couldn't be easily automated and we knew that was a big part of what we wanted to do. So, it really varied solution to solution, but we only wanted to check every box. So, another question that came in kind of ties into this, why not just use the cloud? And I think I know the answer to that. Yeah, I'll let you go. We have to maintain the ability to run a venue without it. So, you think a lot of our venues, when we build a venue, we're usually the first real establishment to go into that issue. Right. And then we're the anchor that people build. So, you know, we build a venue and then for the next four years, basically that area is under construction. You get fiber cuts, you get power issue, all kinds of stuff, right? And we can't afford to have an internet outage take our entire venue. So, we run as much as we possibly can local in the venue, so that if we were to, heaven forbid, have that internet cut or a power cut or whatever, we can at least still maintain basic functionality. So, there's that. Then there's also the cloud, paying by hour, put it on-prem, plus your latency costs, and that's a set cost. So, those are the two big things. Well, and I can personally attest to the build-up you guys see, because you guys know I'm from Fort Worth. So, like, when we go in, I'm like, what did you pay for that venue? Nothing. We get wraps. And there was a blue ring, which is like a pretty decent blue ring. That's why I can't get into that venue. There you go. Yeah. And so, and then, I don't know, six months afterward, the place is like a whole different area of town. Oh, yeah. It's crazy. Yeah. It happens everywhere we go. Yeah. So, okay, let's not move it. Not a lot of questions coming in on the migration process and stuff like that. Talk a little bit about what that was like. You've got a funny story about this. We'll save that for you. But just talking about just maybe in your testing, when you're doing migrations, how does that work? Oh, I'm actually going through one right now. So, basically, what we do is we would stand up Verge Cluster next to the current hardware stack, and then we leverage Verge's backup EDR functionality. So, that connects into our vCenter, and then it basically just backs up all of our VMs from vCenter, like Rupert or all the other backup. They just take a snapshot. They'll suck it in. And then when you do the restore, you have the option of restoring it back to vCenter or importing it in. So, that's effectively what I do. I bring in 40 or 50 VMs we have in the menu, and then I just say, restore them, but put them over here on Verge instead. The process is very straightforward and painless. Our Linux stuff, it's just move it over, change it from the VMware hardware to the Verge I.O. stuff. Windows, a little bit more complicated because you have drivers, but that's not a Verge thing or a VMware thing. That's strictly a Windows. Change your drivers. But we're doing a venue. We can do a venue and make it. So, we go in. Now, tell me what we're doing. So, we're setting this site up. So, we're doing one of our local venues here in Dallas, and we went out, like, the Monday morning, and we deployed the Verge cluster. And then we set up the backup, and we left for the day. Let it run through the day, back those six or seven terabytes of the data. We'll bring them all into Verge. And then that night, they were going to do the tunnel. So, we powered down everything off to VxRail, powered everything up on Verge. We go through, and we do some validation testing. We're like, okay, everything looks good, and I'm going to go out to the venue in the morning just to do one last go through, you know, make sure everything's a thousand percent good. We get out to the venue, and I'm sitting there, you know, doing my checks, and the facility manager comes over. He's like, oh, Jason, what are you doing out here today? I'm like, oh, I'm just, we did your cutover last night. The cutover, yeah, we basically moved everything off of your existing hardware and moved it over to that. We did a heart transplant. And he's like, oh, you did that last night? I'm like, yeah, I'm just finishing up now. He's like, you know, the board of directors was coming on site today to do a demo, right? I'm like, no. So, I mean, I think the, it was slow. That's what we call acid test right there. Yeah. Nobody tells us anything. It just worked. Well, it was great, and that's the cluster that's been running. I think the last I checked, there was like 200 and something days of uptime without so much as a day's go. Yeah. We found out they were due there 10 minutes before they started showing up. Yeah, that's perfect. That's like an IT, though, right? So, a question came in along that. So, they wanted to know, did you have to use a third-party piece of software with all the capability to do the migration building? All the functionality we built in. And no extra charge for that, by the way. The other question was, interesting the way you say it, actually. How is the transition felt? Like, how would it make you feel, I guess? I've got no other question. Okay. Refresh. Yeah, refresh. So, I have called myself several times over the course. I was actually going through my email. So, we started this adventure in March of 2020. I got the email from Jaeger giving me the link to download the ice. So, it's been over a year, and I have called myself several times over the course of that year, going, well, I have, you know, I've spent the last 30 years of my career, 25, 30 years, doing VMware pretty much exclusively. And I was like, wow, where was this 20 years ago? It's like, what have I been doing? Like, actually second-guessing my business based on, you know, saying. I was legitimately upset at myself. Forgot there were emergencies. I think that's my fault. I mean, you're good. Let's see. We've got some more questions coming in. Oh, something's coming on the Terraform stuff. So, you're talking about automation. I guess I just spoiled a little of our Terraform. Yeah. Talk a little bit about what you're able to automate and things like that. So, we have several different pieces of our current automation plan. So, we're using native APIs to do the pre-configuration. So, we're setting up certificates, setting up our sites, and we tend to see all the different things on the site that we talked about previously. And then we've handed off to a Terraform workflow that handles all of our VM deployments. So, it'll build up all of our backend services, all the gaming environments that we need within the backend. So, Terraform does the VM creation and base configuration. So, like our Windows machines, it brings them up, joins them to the domain, Linux does the same thing. And then from there, we actually hand it off to an Ansible replay that'll go through and, like, promote our domain controllers, configure our DHCP scope, install the actual applications, I mean, committees and all that stuff. So, Terraform basically builds the foundation. And then from there, we have another layer that goes on top, sets up our apps. So, one of the things you told me about, like, I think that's, I guess, when you're out, you've got to try to as quickly as possible, shut everything down. What does that process look like for you now? So, previously, we would have, we had a, it was basically a confluence hub. And it gave you all the VMs, and a particular order, they had to be shut down. It doesn't provide service to them. So, you would literally log into vCenter, and you had to shut everything down. In the right order. In the right order. And then once it was shut down, then you would right click on the cluster, go to VxRail, and say shut down the VxRail cluster. If VxRail manager was working. It would shut it down, or sometimes you would have to go in and manually put all the hosting names in the cluster down. And the vCenter, the VxRail manager would lose its connectivity to vCenter, so it couldn't do that. So, on a good day, let's say, or an average day, how long did that process take? We got a little bit of a clock if you're on a power. Yeah, we have an hour, usually, but our UPSs are usually good for about an hour. That sometimes comes close, but usually we get them done. But with Verge, so much. So, we were able to automate that entire process through the API. So, right now, with one of our Verge venues, for SkyRavage, they literally go to our Jenkins master circular, which is the webpage. They click the venue name, and they get a pull-down menu. Start it up, or do you want to shut it down? You shut down, click go, and 15, 10 minutes later, it's power. But no interaction or anything like that. No interaction. Everything's logged. Everything gives you a nice output, showing you everything it did. And then when you want to power it back up, you just go back into that website and change the pull-down menu from shut down to power up, hit build. And it just will log into the iDRACs, power everything up. And it's actually smart enough where it'll sit and wait. So, like when Verge first comes up, you've got to wait for the drives to get redundancy and do all our stuff. But we actually wait. We'll ping it. We have the API. It'll say, okay, now everything's 100%. Now we can bring up our workload. Where with VxRail, it's bringing them up. If you bring them up and say one of them takes a little longer, they don't come up in the right order, the vSAN would be inaccessible. And then you have to basically power everything down, power it up and try again. And then the vSAN would become available. And then you can take the most out of it and make this full and start going. But with the different API functionality with Verge, we can actually check for that as we go and mitigate it as part of it. That's awesome. That's cool. So, I have the officer problem, since I've been working with Verge for so long. I just assume everybody does this stuff. I mean, we had the process automated in VxRail, but then we went into like what we said about how VxRail manager and vSAN ever moves. And that would break that last piece. So, that's the important thing. That's where it's actually turning. Right. You couldn't trust it. It would consistently go into a bad state some time. And you had to be there to validate it to make sure it was still available. If you guys obviously don't have an IT guy, the people, so our facility students at the venue are the ones that manage our technology for us. So, if you think about the guy that's, you know, painting the walls and fixing, hatching gold and somebody puts a gold ball up in a piece of turf, is the same guy that's helping us racking stack servers and troubleshooting. So, we do everything. It's all remote hands for the most part. Hardware replacements, we don't do that. So, I'm going to go back to the questions because you guys are asking great questions, by the way. Let's see. I just. Oh, that's a great question. So, Scott, at what. You'll see. Did you go? Yeah, I think you can run. Oh, it's pretty early. I think the only thing that wasn't fully baked at the time was telephone provided. And we're just team work with. So, we're here and. Really flush that out. And we didn't have to. Change. Yeah, that was a big part of. So, we already had a terrible on process. And what we're allowed us to do with the provider was we basically just took the provider piece out. From the end. Brought the bird provider in. And all of the same workflow. So, we didn't have to. Change a lot of our system. So, yeah, that was a big part of. So, we already had a terrible on process. And what we're allowed us to do with the provider was we basically just took the provider piece out. And all of the same workflow. So, we didn't have to change a lot of our system. And what we're allowed us to do with the provider was we basically just took the provider piece out. So, we didn't have to change a lot of our system. So, we didn't have to spend, you know, we've spent. We're all spent a good amount of time rebuilding that provider. Once we have that bill. It was just. Plug and play with that. So, I knew this question was going to come. How do you handle backup and VR for sites? So. You had mentioned, I don't know what I said or not, but you had another backup solution and. I was like, that's going to be awesome. And we have some ways to kind of get there, but it's not perfect. And we're talking about your journey to what you ended up with from a backup. Yeah, so. We leverage. It's not called for. Our entire environment. That currently runs on VMware. That's been a story for. Five years, six years, almost. What we found when we started playing around with 13 apps. It had a lot of data protection functionality. And. There wasn't. A real need for anything outside of a security side. So we just started. Thinking through what it would look like to. Replace. And there are some challenges that I think. From a replication. Perspective. We were actually able to reduce our storage. So, so we still have our three sites, you know, you had your local. Raising your leveraging. So we have the snapshot on brand venue. That's the regular payments. It's back to our data center. Second, say. And then looking at possibly. Later on. But. Our main thing is we need our, we need our local data. Then you need our offsite. You need our. So. You have fans. All the weather. So question. I'm not sure if it's a question. I'm not sure if it's a question. I'm not sure if it's a question. I think I know who that is. I think I know exactly. CBD, right? Yeah. CBD. I. Yeah. Obviously. But. Who knows what CBD is? So a couple of questions have come in on AI. And. We've learned a lot about AI and that what do you guys know we're not there yet today when you guys see intensely something our audience space or develop a few issues, and behave with respect to kind of this. I'll include a, I would. So, a couple of questions have come in on AI, we've learned a lot about AI and that what do you guys know we're not there yet today when you guys see intensely or something off. Yeah, a couple of things that were cooking. But, with the way we do things in the past we had with the X ray or the ride, we ran everything on its real cluster and presented on so we have a lot of. We have some ideas. Right now the plan I think is to put GPU in our first cluster at data center, and maybe offload some of what we use for completion. I kind of bring that into getting off file. But then we also have some, some interesting ideas going forward on how we could directly contribute to the player experience, leveraging. And by having the Virgin AI solution, they can cluster we can post all that locally opinion, because once you go out to the SAS solutions for AI, it gets expensive. But you got token pops here. So the idea was to have the connectivity issue right if people get really start liking the AI and it goes away for free. Yeah, you got the latency, you got the connectivity so having that being able to run locally on the cluster and have it run me. Oh, so we need to move the cluster and not have to deploy. I'm very excited for what's coming on. So, let's talk. It's great. I have, I don't know how that I don't think I've ever gotten by for my questions. So, talk about, I don't know if you guys have done this yet but system updates and patching. What did that look like? You put the button, and it goes. Yeah, and most part, yeah, it's painless. It was relatively painless in the end there with VxRail, right, they've done a good job putting composite packs together for that system. But all of that is changing, we're a little apprehensive because we're moving more into the update system specific to VMware, away from what they built with Dell. It was a lot of work to get that where we needed it to be. And it's still not exactly where we need it to be. We didn't have a problem migrating GPU, for example, until, what, three months ago? Yeah, I think it was a little while ago. With Verge, one of the first things that we did in the lab was, I mean it was sitting on a cluster that had a GPU. And we didn't understand why a company like VMware couldn't make that work, the way that Verge made it seamless. We have pre-processed. And I love the way that, because it's all Linux-based under the hood, so it's all just squash and pastels. But it will revert itself back to something else. Literally, check for updates, download update, install update, reboot, and then it just goes through and migrates everything and repeats your cluster and gets everything up from zero down time. There was actually I guess a hotfix release yesterday. Yeah, so I just was sitting in the lab yesterday and I was like, oh there's a new one. I didn't reach out to any of the teams that used the cluster. But it's a lab. I was like, okay, it's a lab, but QA used it today and some more developers used it. And I was like, you know what? What? Just let it do its update. Nobody noticed, nobody complained, it was completely transparent. And at the end of the day, that's what we want. You guys are going to be bored. Not as bored as we can do, but we can do cooler stuff. I don't know the answer to this, but we're curious to know. Have you done anything with any of the networking capability? Not yet. So we are looking at separating out some of our workloads into tenants using virtual wires, but that's literally as far as it's gone. Our networking at the venue is very simple. Very flat. A couple of VLANs for different workloads, but that's about it. I know I do have, all I'd like to do is just to sync up our SCN with our network back here and have them kind of sit down and have a meeting of the minds to see where we could leverage that in the future, mainly around like our international business of all franchise partner driven. So that could give us some more opportunities there for the way to support our partners. So yeah, we haven't answered no, not really, but we have a lot of interesting things that might come in handy. So another question is, what effect has it had on your staffing? I think you had mentioned your pretty good staff. Our team is five folks, and three of those are SBDCs. It's really free up a lot of time for us with the automation. So another one to put us in a better spot. And it also impacts our operations, the separate and separate operations. It goes through many day to day. And they haven't had to touch the cluster. And if they do, they just need their automate tools and it's a lot of point blank, you know, and weapons. Yeah, I think. One of the things is like, you know, our previous solution was great for the scale we were. As we now move into 100 venues, and we're looking to continue to grow year over year. It all comes down to simplicity and standardization, right? We don't necessarily want to add more people every time we build we want to be able to manage things smarter. And that's what we've been able to do. And it's not like, oh, we can reduce our hangout because we now have birds. No, no, it's like, hey, we free up cycles. That just means we can build the next player experience. We can find something cool to do that will impact the venues. So that's the way we look at it. It's not necessarily, oh, we can save money by hangout. No, it frees up resources for efficient and really pushing the experience. Like you said, you're an IT shop masquerading as an entertainment venue, right? So it makes sense. That's exactly it. We've been trying to get back to innovation for five years now. But haven't been able to free up the cycles to do it. This will be that for us. It's awesome. All right, so a question about the Dell service came up. So I want to talk a couple things there. Let's first talk a little bit about what you replaced. We've said VxRail, but let me try it now. What you replaced it with, what that looks like from an economic standpoint. And then also, the start of the question, why did you guys choose Dell? So Dell's been a pretty big strategic partner to talk over. I personally have had other experiences with other partners in previous lives. Always landed back on Dell. Never had a problem with their server platforms. Couldn't be solved too quickly. And quite frankly, they're a good source for the price. You just got to hit that early and kill them too. They're easy to name as a step. That's something a couple of the other partners said. With VxRail, we try to come up with different models. Most of the time, they landed on a two-week box. Had probably champions in it. Huge port count, because we had to throw out our gaming program at VxRail. So we would probably, depending on the number of days, say average of 102 virtual rest stops in that cluster, going from game to game. We've optimized our stack in such a way that we've been able to simplify the gaming experience, run all of that inside of Kubernetes. And it's really made an impact on what we need to put in the venue. What's the impact on tile profiles? Can we get us more time on DPS? Will we be able to react faster to things that can be in reality? You know, we've already installed things for a year. So where we had a, on average, five-node cluster of V670, VxRail, LibGPU, a huge tile profile, now we're landing on three nodes of birds with far reduced storage footprint. Primarily because the application is the same. What are we getting right now? I think last time I checked, it was something like 13 or 15 or something. It did drop? Wow. I think that's really great. Yeah, it honestly blew our minds. I don't know how to do it, and I don't know that I care so much. Let's keep doing it. So we've been able to reduce the storage footprint, we've been able to reduce chassis. The cost of the venue has eventually decreased for us, which was incredible because, you know, our previous service stack cost quite a bit of money. And right now, with, you know, the virtual licensing and the service stack, we're looking at less money. For the first seven years. That's awesome. So, I don't know the answer to your question, but I'm going to ask it. Did you have any software that were basically virtual appliances that you need to worry about migrating over? Yeah. Well, some of them we didn't need to take, right? So like V-Realize Operations, Loginsight, security stuff, Cisco, that runs in OVA, and so OVA Appliance. Migrating it from our existing VMware environment into Verge was seamless. It was just, it took it like another Linux VM, and we're moving forward with those types of products. We're looking to leverage, they have a KVM appliance, which is basically, so we're hoping we can just, we're going to work with that vendor to get that in, or else we also have other ways around it. Basically leverage, instead of having the appliance live locally at the venue, we can put it in other places like our data center. And as long as we have that failback for using it. And then on the V-Realize front, we're using basically our telemetry gate noise to back those sort of stuff. So we already have a pretty robust observability platform, and we've been able to hook into the Verge API, to hook into our observability, to pull all the stats out. That's still a work in progress. We have the basic functionality out there that we would need, you know, cut down and general SLAs for the venue business, but there's still a question on it. Yeah, so we can take a certain side note, and OAI would probably care about this, but we might be our best attended webinar prior to this one, was one we did on observability and automation, their problem, and the maintenance point. Yeah, your previous webinar? Yeah. I've been looking at that, and right now I'm getting what you have. Yeah, same as us. We attract deep technical detail. I can tell you that, it's true. Okay, now that's great. Let's see. Let's see. So the hardware footprint, fewer numbers are assumed, smaller physical space itself, and performance, I don't know if you guys were necessarily pushing the OVAC, pushing our botnet, making certain style logs, so we don't do a ton of items, right? We don't have a lot of disk movement, but we did put everything on the media, so the performance has gone even further than we needed. We've reduced our core count, but they are faster. Yeah, we were originally, well, we were dual with 24 core sockets per server, across five servers. We've now brought that down to a dual 64 across three servers. A lot of that, too, is also being mitigated by our virtual desktop, right? We moved those out, we moved the workload directly to the gate panel, as you see in the shot. A lot of our GPU workloads, we use the standard one hardware. So that's reduced a lot of it on the server side, but we do get the GPU and all that. Are we going to see these guys have a public look on their face? Because I'm smiling at something I shouldn't be smiling at, because I'm reading your questions. And the one that made me smile is a guy that just became a customer, likely see him so far, but he likes my hat. There's a whole backstory behind the hat, but we'll make sure you get a hat. All right. Anybody that's a customer that's on this webinar, email me and we'll get you a hat. So, wow. So, that's pretty funny. I've never had a hat. Looks good. And we're not going to sell it to you, we're going to give it to you because you're a customer. So there you go. Because you were saying it's a first series, so they're probably done the last. Oh, wow. So, let's see. From a, I want to go back just real quickly to the Dave reflection stuff. You're mostly doing this with snapshots and replicates now, I'm assuming? Cloud snapshots. The other thing I like about the cloud snapshots is not only does it take snapshots of the VM, but it also does the management layer. So, if you really threw up something in your web UI, you'd be able to start a snapshot. So, setting back and go. Right. Yeah, we backed that up. We're doing the default, you know, the midnight, the hourly, and the noon. You know, we just take our midnight and we ship that off site. So, wow, this time just flew by. A couple of kind of final thoughts. So, let's, you know, a lot of people, apparently a few Verge customers or recent Verge customers, guys are kind of going through the process, and you guys have been going through it for the last year or so. Let's give them some advice. Other than go live early. Fine. But give them some advice on how they go through this process. Yeah. The Verge guys are extremely responsive, and they want to help. They were incredibly helpful. Through our POC, even when we had a feature that wasn't quite there, that we needed to be in the platform to go launch, they were able to take that to the product team and get priority. It was really refreshing to partner with those folks as we went through the process. And then, I can't think of a couple of them. Any issues dealing with Broadcom as far as when you start shutting licenses down and stuff like that? No. I mean, we're locked into an EOA with them. So, I don't know. Maybe I don't want to say the word, but maybe they're knocking on the door at some point just to make things a little harder. But they haven't done that yet, which is great. I hope they don't, because we've had a really long partnership with them. And, you know, there's certainly other opportunities outside of the Eastern environment. But no issues from Broadcom, apart from our experience through their EOA process. We've been kind of checking the line every five minutes and not really have something to really rely on as we go through the process. I haven't been spying on any of these at all. I've been saying, okay, I guess that's good. That's all you can ask for, right? Yeah. Yeah. So, the question came in on time. You guys talked about how it's taking your time, and you're hoping to get back to innovation. What a, like you said, ballot thing is too hard, but like in terms of hours or something, what do you think that's, what's the best way to do it? I mean, I don't know that there's a way to do it. I mean, I don't know that there's a way to do it. I mean, I don't know that there's a way to do it. Yeah. I mean, it's like you said, if it's like an hour or something, what do you think that's. Saying. That's a, that's a tough one. So. Yeah. I'm not going to stop. You look good. So. I did a little bit of math. Last week on this. In the neighborhood of 20 to 30 hours a week for our operations team. And we had a little bit of a break. We had a little bit of a break there. Which is great because it allows you to focus more on the things that any of these things rather than the things that are going down. And so. For our team. I mean, through. Just the POC and after you make that statement, started to deploy. You really. You know, So. You know, to be honest, I'm going to say. 70 to 80% of their time now is focused on the things that you want to do. Rather than the things that we have to do. So a little bit. Firefighting. Yeah. Yeah. We're building up a lot of that firefighting directly into our. So we don't see something and be smart enough to say, Hey, kick off this API call into this cluster and we will fix your issue. So we're looking at. Now we have to be very, very careful about. We don't want it to. Do something that's not supposed to, but. And that's on us from our side. Yeah. Right. You want to make sure that works out well. All right. Well, hey, we've. We're about out of time. Guys. Thanks very much again. For taking time and for allowing me to kind of. Get into your lab here. We will be doing events around the country at various top costs. So stay in touch with your sales team. I'm sure we'll be getting there. You at some point. We were talking about that yesterday. So, but for now. I'm George. Have a great day. Thank you.

TL;DR

  • Topgolf migrated from VMware to VergeOS following Broadcom's acquisition, which brought 400% price increases and constantly changing licensing terms that made the platform financially untenable
  • VergeOS enabled comprehensive automation through APIs, Terraform, and Ansible, reducing emergency shutdown procedures from hour-long manual processes to 10-15 minute automated sequences and freeing 20-30 hours per week for innovation
  • The company simplified infrastructure by moving from proprietary VxRail hardware to consumer-grade servers, consolidating from five-node to three-node clusters while improving performance with NVMe storage
  • Topgolf achieved zero software or hardware-related downtime across distributed venues since deployment, with each location operating as an isolated, self-sufficient infrastructure environment
  • The migration reduced core counts while increasing performance, eliminated separate storage infrastructure through hyperconvergence, and delivered substantial cost savings in both hardware and licensing
  • VergeOS's responsive partnership approach and willingness to prioritize customer-driven feature development proved critical during the evaluation and deployment process

The VMware-Broadcom Acquisition Impact

Topgolf's infrastructure team faced significant challenges following Broadcom's acquisition of VMware, with proposed price increases approaching 400% and constantly shifting licensing terms. The company was in the midst of negotiating a DLA agreement when the acquisition occurred, leading to multiple contract revisions and uncertainty around vendor partnerships. While Broadcom eventually offered more reasonable pricing, the costs remained exponentially higher than budgeted, prompting Topgolf to evaluate 14 alternative infrastructure solutions. The team needed a platform that could support their lean IT operations across dozens of geographically distributed venues, each requiring isolated, stable infrastructure with minimal hands-on management.

Automation and Operational Efficiency Gains

VergeOS enabled Topgolf to achieve comprehensive infrastructure automation through API-driven workflows and CI/CD pipelines. The team implemented automated venue shutdown and startup procedures that reduced emergency power-down operations from manual, hour-long processes to fully automated 10-15 minute sequences executed through a simple web interface. Using native APIs for pre-configuration, Terraform for VM deployment, and Ansible for application configuration, Topgolf eliminated manual intervention across the infrastructure lifecycle. This automation extends to certificate management, cluster configuration, and workload deployment, with everything managed through code to ensure consistency across all locations. The approach has freed 20-30 hours per week for the operations team, shifting focus from firefighting to innovation.

Infrastructure Simplification and Cost Reduction

The migration from VxRail to VergeOS allowed Topgolf to transition from proprietary, customized hardware to consumer-grade servers, significantly reducing both hardware costs and lead times. The company consolidated from five-node clusters with dual 24-core processors to three-node clusters with dual 64-core processors, while simultaneously improving performance by moving to NVMe storage. VergeOS's hyperconverged architecture eliminated the need for separate storage infrastructure, and the platform's stability has resulted in zero downtime attributable to software or hardware issues since deployment. The simplified hardware approach, combined with reduced licensing costs and improved operational efficiency, has delivered substantial cost savings while enhancing reliability across Topgolf's distributed venue network.

Partnership and Platform Maturity

Topgolf's evaluation process, which began in March 2024, revealed VergeOS as a partner focused on customer success rather than product sales. The vendor's responsiveness and willingness to prioritize feature development based on customer needs proved critical during the proof-of-concept phase. When Topgolf identified missing functionality required for production deployment, VergeOS worked directly with the product team to deliver necessary capabilities. The platform's comprehensive API functionality, cloud snapshot capabilities that protect both VMs and the management layer, and built-in disaster recovery features provided enterprise-grade functionality that exceeded the team's expectations. For infrastructure professionals with decades of VMware experience, VergeOS represented a paradigm shift in how virtualization platforms can simplify rather than complicate operations.

Chapters

0:00 - Introduction and Background
2:34 - VMware-Broadcom Acquisition Challenges
4:44 - Evaluation Criteria and Requirements
8:00 - Hardware Simplification Benefits
20:17 - Infrastructure Professional Perspective
21:20 - Automation Architecture and Workflow
22:40 - Emergency Shutdown Automation
42:58 - Performance and Resource Optimization
45:05 - Data Protection and Snapshots
46:00 - Migration Advice and Lessons Learned

Key Quotes

1:35 "Topgolf is a technology company that masquerades in sports and entertainment, honestly. Everything that we do is technology-driven."
3:01 "The cost wasn't tenable. The fact that everything was changing every five minutes. We were in the midst of trying to sign a DLA agreement with Broadcom and with VMware at the time of the acquisition. Broadcom kind of changed that a couple of times."
4:06 "We were looking at stuff like a 400% price increase and things like that. So, I think that probably would be like half of your IT budget anyway, right? ..."
5:49 "My general rule of thumb is if a finger has to touch a keyboard or a mouse, then that's a policy mistake."
8:32 "For me, it's all about privacy. With Verge and the team there, they're very original. For me, it personally feels like they're not out to sell a product. They're out to help us be successful."
20:51 "I have called myself several times over the course of that year, going, well, I have, you know, I've spent the last 30 years of my career, 25, 30 years, doing VMware pretty much exclusively. And I was like, wow, where was this 20 years ago? ..."
24:07 "With one of our Verge venues, for SkyRavage, they literally go to our Jenkins master circular, which is the webpage. They click the venue name, and they get a pull-down menu. Start it up, or do you want to shut it down? You shut down, click go, and 15, 10 minutes later, it's power."
47:46 "In the neighborhood of 20 to 30 hours a week for our operations team. Which is great because it allows you to focus more on the things that any of these things rather than the things that are going down."
Categories:
  • » Webinar Library
  • » Webinar Library » Verge.io
  • » Data Protection » Backup & Recovery
  • » Cybersecurity » Cloud Security
  • » Data Protection
Channels:
News:
Events:
Tags:
  • Cloud Security
  • Data Protection
  • Customer Story
  • Technical Deep Dive
  • Best Practices
  • VMware migration
  • Broadcom acquisition impact
  • hyperconverged infrastructure
  • infrastructure automation
  • API-driven operations
  • distributed venue management
  • disaster recovery
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: Topgolf's Migration from VMware to VergeOS

              Upcoming Webinar Calendar

              • 06/17/2026
                12:00 PM
                06/17/2026
                Action1: The Remediation Gap: Vulnerability Management in the Age of AI
                https://www.truthinit.com/index.php/channel/2010/action1-the-remediation-gap-vulnerability-management-in-the-age-of-ai/
              • 06/23/2026
                01:00 PM
                06/23/2026
                The AI-Powered VMware Alternative
                https://www.truthinit.com/index.php/channel/2009/the-ai-powered-vmware-alternative/
              • 06/24/2026
                11:00 AM
                06/24/2026
                LATAM: Accelerating Insights on AI Through an Engaging Webinar Series
                https://www.truthinit.com/index.php/channel/2012/accelerating-insights-on-ai-through-an-engaging-webinar-series/
              • 06/25/2026
                01:00 PM
                06/25/2026
                Generative AI Security: Preventing AI from Becoming a Data Breach Multiplier
                https://www.truthinit.com/index.php/channel/1998/generative-ai-security-preventing-ai-from-becoming-a-data-breach-multiplier/
              • 07/01/2026
                04:00 AM
                07/01/2026
                Schutz von KI in Anwendungen, Agenten und APIs.
                https://www.truthinit.com/index.php/channel/2008/schutz-von-ki-in-anwendungen-agenten-und-apis/
              • 07/02/2026
                10:00 AM
                07/02/2026
                Resilience Insights from Hybrid Threats When the Cloud Faces Challenges
                https://www.truthinit.com/index.php/channel/2011/resilience-insights-from-hybrid-threats-when-the-cloud-faces-challenges/

              Upcoming Events

              • Jun
                17

                Action1: The Remediation Gap: Vulnerability Management in the Age of AI

                06/17/202612:00 PM ET
                • Jun
                  23

                  The AI-Powered VMware Alternative

                  06/23/202601:00 PM ET
                  • Jun
                    24

                    LATAM: Accelerating Insights on AI Through an Engaging Webinar Series

                    06/24/202611:00 AM ET
                    • Jun
                      25

                      Generative AI Security: Preventing AI from Becoming a Data Breach Multiplier

                      06/25/202601:00 PM ET
                      • Jul
                        01

                        Schutz von KI in Anwendungen, Agenten und APIs.

                        07/01/202604:00 AM ET
                        More events
                        Truth in IT
                        • Sponsor
                        • About Us
                        • Terms of Service
                        • Privacy Policy
                        • Contact Us
                        • Preference Management
                        Desktop version
                        Standard version