Transcript
Mike Matchett: Hi Mike Matchett with Small World Big Data and we are today talking about of course AI, GPUs and how to get the most efficiency and scale out of what you're doing with your investments in AI. It turns out some of the friction, some of the problems, uh, at scale aren't maybe where you expect them to be. Uh, we've got some interesting conversation with C-Gen.ai coming up. Let's hold on and we'll get right into it. Hey, welcome to our show, Sami. Sami Kama, CEO: Hello. Thank you for having me. Mike Matchett: Uh, so you're with, uh, you started C-Gen.ai. You've got a long, illustrious history with some pretty high end physics and stuff in your background. Can you just give us a really quick rundown of some of the things you've been involved in, uh, that have led you to looking at AI at this level? Sami Kama, CEO: Yeah. Uh. Thank you. So I'm a physicist by trade, but I had a knack for computers, so it's always, uh, stuck with me. Uh, I spent about 17 years, uh, working at CERN in different capacities. And, um, there I was involved in the simulations, uh, trigger farms, real time HPC, HPC, type computing, performance optimization, research into new emerging hardware and software technologies, uh, as well as, uh, authoring a couple of uh, profilers when I was there, etc., for to be able to understand. And most of the things that we did there is the bleeding edge. So most of them, most of the time we were, uh, building everything from the scratch to solve our problems. Mike Matchett: And you were doing you were doing a lot of, uh, large scale, uh, observability analysis, performance optimization. But then then you moved on from there into some other companies we know about. Tell us about. Tell us about some of those. Sami Kama, CEO: After 17 years. Um, I decided to move to the industry. And then I, uh, started working for Nvidia. And at the end of 2017, there I was involved in the TensorFlow stack. The TensorFlow one made sure that the the optimization. I also, uh, kind of come up with the idea of the TensorFlow TensorFlow integration. I helped AWS, uh, when I was in, uh, I mean, a little bit, and then AWS made me an offer. So I started working for the SageMaker team in AWS. During the the SageMaker tenure, my SageMaker tenure, I was involved in some ML perf level optimizations there. We broke a couple of interesting records that year. Afterwards, I noticed the problem of the performance observability. There wasn't any AI specific, Uh, performance monitoring tools. Most of the time people were trying to profile a single GPU or a single node, but the issue that we found was observed if and only if you profile the whole job as a monolithic research project. So I sit down and develop the SageMaker profiler for distributing and and profiling and distributed manner. That also led to me that most of the time people, even if you have the performance understanding, there are other inefficiencies, not only on silicon level, but, uh, expectations of the customers on how to use. They could not be able to judge the Or. It was not obvious that you can get more out of what you pay. So and then what are the best practices of approaching this? Mike Matchett: Do you really you're really coming from this, this, this scientific background Approaching, like, you know, how someone optimizes their infrastructure. I have a long history myself in capacity planning performance, but really from the IT data center side of things. You know, like is there IO bottlenecks? Are there memory bottlenecks and getting our corporate databases running properly. And you you've come at this from I feel I feel a little, you know, like lower down in the food chain here talking to you about like this, but but yeah. So I think it's really, really interesting. So the problem just to, just to sort of shape this problem, you, you sort of spent this, this, you know, last couple of decades looking at AI workloads and how they're different from other workloads when it comes to like looking at how efficient they're using their infrastructure. If I was just just kind of like that. So so where did where did that where did that bring you to what how did you start to think about c gen. Sami Kama, CEO: So the AI is essentially a research problem as you said. But it's a special case of extremely multidisciplinary research problem. And what we observed in my career and when I was working at is that people try to approach this like a web, uh, orchestration web problem. So. Yeah. What I had heard of is that instead of trying to take an off the shelf solution, that everybody tries to use the web knowledge for the application, we come up with a solution that is specifically targeting the AI from ground up, rather than analyze the problem and and solve the problem for the, uh, develop the solution for the problem. So, uh, is this answer. Mike Matchett: Yeah. Yeah. So I mean, I mean, just to be more specific when we're talking about AI, uh, infrastructure and trying to be optimal with it, most people are thinking GPUs like, do I like I've got a I've got a lot more GPUs. I got to buy more GPUs, I got to buy bigger GPUs. But what just what are some of the issues you've noticed with people coming at that from this mentality of it's all about the GPU? Where where are they missing the more subtle interpretations. Okay. Sami Kama, CEO: So the as you said, the people think that the AI is equals GPUs, but GPUs like the engines of a car. Right. You need engine to move the car. But this is a race and you need the rest of the car. So the for AI you need an AI supercomputer. And that involves auxiliary services like what you described the the optimization of the network bottlenecks file iOS as well as identity management, observability stack the storage management, job management. On top of that, currently most people approach this as the. For the AI there are two parts. One of them is the training part, the other one is the inference part. And currently people approach them as if they're independent. The tasks for the AI, it is true, but from the infrastructure perspective they are not. And then you don't need separate silos for it, so hence are in our solution. We developed a converged version of it that you can use the same infrastructure, both for training and inference at the same time, and dynamically adapting to the demands of the training or the infrastructure inference part. Mike Matchett: All right. Let's just let's get yeah. If you don't mind, let's talk about those workload halves for a second. So in I we know you mentioned there's the training the training sort of phase where it could go for weeks or months. Uh, doing doing this in high io. And you need, you need the whole cluster and you need to be able to take snapshots and do things like that. But then there's the inference sort of runtime thing. And people are saying, you know, when I go to do inference today, I don't want to interrupt my training cluster where I've spent all this. I have to build a separate set of facilities to run the models. And what you're what you've just sort of indicated is that's not necessarily the best way you're going to use the same. Same thing. So how do you how do you how do you dynamically balance those competing interests? I mean, you don't you want your training done, but your inference has got to be more real time. Sami Kama, CEO: Yes. But let's think about the problem again. So in the training this is still a research project. So we still need to run tests. Experiments develop to code. So we are not fully utilizing the clusters training clusters to the capacity. But we are still paying for them because these are usually a long term contract. Large clusters. So this full utilization of the cluster capacity is maybe one month worth of the whole year because of the debugging model development, experimentation and tuning. And in the rest it's used small capacity. In our experience, we observed between 40 to 60% utilization on average throughout the year of the cluster, hence inference and training are dynamic, but they can be combined and then using the same infrastructure without wasting any resources, making use of the existing resources when they're available, and using traditional siloed versions when you run out of capacity in the cluster. But with current trends, the currently is much less likely to happen because of the nature of the problem. All right. Mike Matchett: So, uh. Sami Kama, CEO: Yeah. Mike Matchett: Yeah. So when, when we when we're looking at the running these workloads, uh, I mean, inference, I think people people can understand, you know, because that's what they're more familiar with. But when you when they got these training calls, you buy a bunch of DGX super pods, for example. You've invested millions of dollars in that cluster on there. Uh, it would make sense that people would would want to optimize the utilization on their, uh, and but there's also this part about observability. So just again, maybe just step back and say is more concerned with optimizing how AI uses the cluster and getting that observability done. Or is it about combining the workloads, uh, or is it about, uh, improving the financial situation of making higher utilization? Uh, you know, we're sort of the, the so. Sami Kama, CEO: It's a bit of also our system dynamically observes the training and, and the inference workloads learns about them and try to maximize, maximize the cluster utilization. Hence, uh, reducing the cost of ownership of this cluster, uh, assuming the traditional way of running the inference, these inferences are run in some service, uh, some big, uh, hyperscalers or now emerging new clouds. You give your model to them, they run your, uh, inferences. But the inference API margins are already thin and you give them an hour based. Utilization cost for this, which means that which makes your margins even smaller. 510% of the API revenue is only for the company, but in our solution, you don't pay for these extra instances because you already paid for them for the training cluster. Hence, you're making use of already paid instances. You can, uh, profit 90% of the API revenue. 8,590% of the API revenue can become profit to the company, hence expanding the runway for the startups and then addressing the largest financial costs in the AI research. Mike Matchett: Okay. So I mean that that's a that's a huge benefit to a lot of people who are spending some really big dollar amounts on AI, which is almost every large enterprise today, is looking at these AI initiatives. So there's definitely an opportunity here for a lot of people to look at and see what you're doing. Let's just step back again and say start again from the bigger picture, because I think we might have lost a little bit of the subtlety here. Um, when we when we look at when you start talking about how it puts the whole formula F1 car together, the analogy there on there, what what are you wrapping your hands around with, and what are all some of the components that go into that, that AI high performance vehicle that you're building automatically for the customer out of their infrastructure. Sami Kama, CEO: So what we are building in the customer is that first we abstract the the the data center parts. So the customer goes and gets the instances from whatever they want. They can go to the hyperscalers and then sign their long term contracts. They can go to the the data centers, which doesn't have any managed solution or anything, and then get their GPUs, or they can just purchase the instances like you suggested the DGX pods and put them into their own data centers. So from this point on, what we do is that we come in and start configuring these instances and auxiliary services. So for I you what we you need what we call is the AI supercomputer. So our product builds a AI supercomputer for our customers in their account that involves the that handles the storage, identity management and and the compute optimizations, the training and inference abstraction, as well as the observability. So they can see whether they're really utilizing these clusters. Today, the observability stack is missing in most of the existing managed offerings, so that most people are not aware that they are overpaying the clusters and are not benefiting from them. But it is a first class citizen. We need this for our training and inference abstraction to be able to work, so we provide this to our customers as an added on benefit. Hence, uh architect involves building an AI supercomputer for our customers on their infrastructure, on their account, 100% under their control. All their models, inference and training are under their control. And we just manage the instances, the hardware failures, the observability stack, the storage and then the identity. They can use our simple point and click UI to handle most of them, but they still have access direct access to the traditional way of training, like in the big companies. You have your login nodes, you can run your large training jobs, and it's 100% under your engineer's control. So we're just there like a pit crew or an engineering team for the Formula One car. Yeah, and they're on the driver's seat. Mike Matchett: All right. So I think I get this now. So you're really coming in and saying, you know, we'll take your infrastructure, put our hands around it, and we'll build it into this kind of I don't say virtual because it's not that virtualized, but it's but but a like a like a cloud service almost in an analogy, building an AI supercomputer service for the company. And within that service, you guys are now automatically handling, like you said, identity file services. The IO you mentioned error handling, which is fault tolerance, which is critical in a larger, larger than the nodes get, the larger the node gets, you know, fault handling the fault tolerance below the hood is is better. Uh, it gets better and better. Um, and and make it I assume I assume part of this message is making it scalable as well. Like once you created the AI supercomputer, now you've made it easy for a company to scale up and down if they need, of course. Sami Kama, CEO: Uh, since we abstract most of the data center and then the instances away for in our solution, technically you can build it in anywhere and you can move it anywhere in the world in the data center. So our customers have freedom of choice to do it the way that they want. And then we will help them to migrate between the different, more feasible options that they might have between their different providers. And then they can scale up or scale down based on the observability and their needs. Mike Matchett: So I sort of assumed earlier when I made a statement that large enterprises are the target of this, but who really who really would benefit from this, from having their AI supercomputer concept? Sami Kama, CEO: So we we think that three part three type of, uh, clientele would be benefiting from such things. One of them is the most obvious one, right? Ai startups who are building their models, and they are trying to build the make and their APIs and the their make it out in the AI. The second one is the data centers. Most people thought that most data centers thought that just purchasing the GPUs would make them an AI provider. But you need an AI supercomputer, not just the GPUs. And there are a lot of data centers around the world that do not provide necessary auxiliary services in managed form so that the companies, in order to bypass the expertise, need to manage these systems. They go to the hyperscalers, hence these neo clouds. And then the GPU marketplaces arose from the this, uh, lack of demand on these smaller data centers. Our products make their data centers very feasible. Ai supercomputers, they become competitive offerings, and it becomes a managed service on their infrastructure, hence helping them to sell more GPU time over the long term contracts. And the third clientele are the enterprises, as you said, uh, enterprises. You probably notice that, are cautious of using APIs and the big data providers because, uh, even though the APIs claim that they are not going to use the data first, it's very hard to prove that if they are used, uh, whether they're used or not. The second one is, even if they don't use the data, they can still deduce information from the your use, uh, about your business. So if you own unique data for your business, you should enhance your AI, uh, process into your workflows. If you don't want to give the your business know how to the. Mike Matchett: Api everyone else. Right? I mean basically basically you're going to help train the world AI models and do something else. Uh, okay. So so yeah. So this is good for anybody who wants to have this AI supercomputer concept, whether they're using it in-house or delivering it. Um, it just occurred to me that maybe sort of. One last thing to talk about is the idea that if I put my, um, training and my inference together, which is really two different phases, it's the R&D upstream and then the use of that model, those models downstream into the same infrastructure. I can also maybe shortcut the workflow between that how how models migrate from one to the other. Do you do you help orchestrate that as well? Sami Kama, CEO: Yes. So in our system, we we take care of the most of the inference handling. And then all our customers need is to be able to run a local host example of the their model that they want to serve. What does this is that since the training and inference are in running in converged mode, it is very easily, uh, possible to build up a continuous update cycle in our infrastructure, the inference data feeding into the training and then the trading updates can feed into the inference models, and that doesn't require months of, uh, planning or updating. It can be done in much easier, much simpler way. Mike Matchett: And yeah, I mean, isn't that going to be the future of AI, where the models actually learn and they can get closer and closer to real time knowledge and feed that back to people? Right. I think I think some people are really disappointed when they look and say, well, my model was built six months ago and it doesn't know anything about what's happened. And now now we have some infrastructure here that promises to support a much tighter cycle of of learning. Sami Kama, CEO: So there's another advantage of our approach. So the training GPUs are usually training instances are usually much more stronger than the traditional inference ones. So since we are using already paid for high highest end instances, their inference costs also is kind of reduced significantly and they can do more. Uh, that that is traditionally not possible and become competitive. Uh, right. Compared to the other companies that are in the same domain. Mike Matchett: Right. And certainly my friend's performance goes way up and, you know, you get this tighter. I mean, people we used to talk about, you know, like a ten second response time being bad because people's attention starts to wander. And every time I put in a prompt, it's at 10s right now. So if we can start to bring that down to just almost a real time kind of response prompt, people will be a lot more productive with it. Um. Sami Kama, CEO: It is not only this, uh, the traditional models usually limited to a single node, so your model needs to fit into single note. We don't have this limitation. You can technically. Well, theoretically you can run whole cluster as an inference node to run a super large model that is even bigger than today's most popular models. If you really, really want to do that and. Mike Matchett: Which will we give you give you a superior performance on that. I make it really look human. Sami Kama, CEO: Yeah. So that we try to solve the problem by analyzing the problem first. How is that estimating how it's going to evolve in the future and then be ready when our customers need it. Mike Matchett: All right, all right. Well, uh, you sound like the right guy to have start to put this together with your, uh, background at Nvidia and AWS and CERN for all those years and some other AI companies we've all heard about, um, that's about all the time we have today, though, even though we'd like to dive into this some more, uh, before you take over the world. So people have to maybe take a look at this, understand more about it, do a little more research into it. What would you recommend they start if this is pique their interest, where where should they go. Sami Kama, CEO: So they can go to our web page www.c-gen.ai or uh, we are we're working towards our better version so we have better sign up forms. If anybody wants to interested, we can get in touch through the beta signup, or they can use our contact form in the web page to contact us. And then we would be very happy to discuss with what we can do together on their platforms and make them. Um, yeah. Mike Matchett: Super. Yeah, I supercomputers, super super computing people I guess we have to come up with come with the reverse term, uh, on there. Uh, all right. Well, thank you so much Sami for being here today, I appreciate it. Giving us an inside look at what's coming, really. I mean, this is the future of AI, uh, impossible AI supercomputers in every company. So, uh, glad to hear it. Sami Kama, CEO: Uh, thank you very much for having us, Mike. Mike Matchett: Take care.