Transcript
In this screencast, we will show the preview of the upcoming 1LLM feature that brings the AI Inference directly into the platform. Running AI Inference workloads on-premises might easily become challenging. Both the administrators and users often struggle with the lack of unification across the process, finding themselves managing GPU servers, model weights, and inference software outside of the graphical UI. In the upcoming release of Open Nebula, we are addressing these challenges by adding an AI Inference section directly to Sunstone, a single place to define hardware profiles, curate AI models, and deploy production-ready models with just a few clicks. 1LLM is a new section inside Sunstone that brings AI Inference into the platform. No external tools, no separate infrastructure to manage. There are two perspectives in this demonstration. First, the admin, who sets the things up, then the tenant, who puts them to work. As an admin, you start by defining instance types. Each one specifies the GPU, the VRAM, the compute tier, and the model it can handle. Small for lightweight use, medium for most workloads, large for the biggest models. Every type is fully specified. H100, 10 gigs of VRAM, the exact model size range it supports. Define it once, reuse it everywhere. The second thing an admin manages is the model catalog, the library of AI models available inside the data center. Models are downloaded, versioned, and access controlled. Tenants only see what's ready. The admin decides what's available. From here, this is what a tenant experiences. They pick a ready model, pick their instance type, and deploy. OpenAbility handles the rest. It provisions the virtual machine, loads the weights, and starts the Inference engine. No manual steps, no SSH, no scripts. When it's live, the tenant gets an OpenAI-compatible API endpoint. Any app already leveraging the OpenAI SDK connects with zero code changes. And they can test it right here, inside Sunstone. A live conversation with the model. No external tooling needed. Admin sets it up, tenant puts it to work. One platform, from the bare metal, to a virtual machine. And they can test it right here, inside Sunstone. A live conversation with the model. Admin sets it up, tenant puts it to work. One platform, from the bare metal, to a live AI endpoint. And this concludes this feature preview demonstration. Thank you for watching, and see you in the next screencast.