OneLLM: Native AI Inference in OpenNebula Sunstone

Name: OneLLM: Native AI Inference in OpenNebula Sunstone
Uploaded: 2026-06-13T06:39:15-04:00
Duration: 2 min 49 s
Description: In this screencast, we’re giving you a first look at OneLLM, an upcoming feature that brings AI inference directly into Sunstone, OpenNebula’s GUI. This preview shows what we’re working on and how OneLLM will make it easier to access and run AI workloa...

Open Nebula

06/13/2026

0 (0%)

Report Like Favorite

Transcript

In this screencast, we will show the preview of the upcoming 1LLM feature that brings the AI Inference directly into the platform. Running AI Inference workloads on-premises might easily become challenging. Both the administrators and users often struggle with the lack of unification across the process, finding themselves managing GPU servers, model weights, and inference software outside of the graphical UI. In the upcoming release of Open Nebula, we are addressing these challenges by adding an AI Inference section directly to Sunstone, a single place to define hardware profiles, curate AI models, and deploy production-ready models with just a few clicks. 1LLM is a new section inside Sunstone that brings AI Inference into the platform. No external tools, no separate infrastructure to manage. There are two perspectives in this demonstration. First, the admin, who sets the things up, then the tenant, who puts them to work. As an admin, you start by defining instance types. Each one specifies the GPU, the VRAM, the compute tier, and the model it can handle. Small for lightweight use, medium for most workloads, large for the biggest models. Every type is fully specified. H100, 10 gigs of VRAM, the exact model size range it supports. Define it once, reuse it everywhere. The second thing an admin manages is the model catalog, the library of AI models available inside the data center. Models are downloaded, versioned, and access controlled. Tenants only see what's ready. The admin decides what's available. From here, this is what a tenant experiences. They pick a ready model, pick their instance type, and deploy. OpenAbility handles the rest. It provisions the virtual machine, loads the weights, and starts the Inference engine. No manual steps, no SSH, no scripts. When it's live, the tenant gets an OpenAI-compatible API endpoint. Any app already leveraging the OpenAI SDK connects with zero code changes. And they can test it right here, inside Sunstone. A live conversation with the model. No external tooling needed. Admin sets it up, tenant puts it to work. One platform, from the bare metal, to a virtual machine. And they can test it right here, inside Sunstone. A live conversation with the model. Admin sets it up, tenant puts it to work. One platform, from the bare metal, to a live AI endpoint. And this concludes this feature preview demonstration. Thank you for watching, and see you in the next screencast.

TL;DR

OneLLM brings native AI inference capabilities into OpenNebula's Sunstone GUI, eliminating the need for external tools or separate infrastructure management for on-premises AI workloads.
Administrators define reusable hardware profiles specifying GPU types, VRAM, and supported model sizes, then curate a versioned model catalog with granular access controls for tenant consumption.
Tenants deploy production-ready inference endpoints by selecting pre-configured models and instance types, with OpenNebula automatically provisioning VMs, loading weights, and exposing OpenAI-compatible APIs for zero-code integration.

Summary

This demonstration previews OneLLM, an upcoming OpenNebula feature that integrates AI inference capabilities directly into the Sunstone GUI. The feature addresses common challenges organizations face when running on-premises AI workloads by eliminating the need to manage GPU servers, model weights, and inference software outside the platform. OneLLM provides a unified interface where administrators can define hardware profiles with specific GPU and VRAM configurations, curate AI model catalogs with version control and access management, and enable tenants to deploy production-ready inference endpoints with minimal configuration. The system provisions virtual machines, loads model weights, and exposes OpenAI-compatible API endpoints that work with existing SDK integrations, allowing organizations to run AI inference workloads alongside their existing cloud and edge infrastructure without external tooling or manual intervention.

Chapters

0:00 - Introduction to OneLLM
0:52 - Admin Perspective: Instance Types
1:35 - Admin Perspective: Model Catalog
1:49 - Tenant Workflow: Deployment

Key Quotes

0:16 "Both the administrators and users often struggle with the lack of unification across the process, finding themselves managing GPU servers, model weights, and inference software outside of the graphical UI."
0:33 "We are addressing these challenges by adding an AI Inference section directly to Sunstone, a single place to define hardware profiles, curate AI models, and deploy production-ready models with just a few clicks."
2:14 "Any app already leveraging the OpenAI SDK connects with zero code changes."

FAQ

What problem does OneLLM solve for organizations running AI workloads on-premises?

OneLLM addresses the lack of unification in managing on-premises AI inference by bringing GPU server management, model weights, and inference software directly into OpenNebula's Sunstone GUI. This eliminates the need to manage these components separately outside the platform, providing a single interface for defining hardware profiles, curating AI models, and deploying production-ready endpoints.

How does OneLLM handle compatibility with existing AI applications?

OneLLM exposes OpenAI-compatible API endpoints for deployed models, allowing any application already using the OpenAI SDK to connect with zero code changes. This ensures seamless integration with existing AI workflows and tooling without requiring custom development or API adaptation.

Categories:

» Cybersecurity » Cloud Security
» Data Protection

Tags:

Show more Show less

TL;DR

Summary

Chapters

Key Quotes

FAQ

Discover DLP Memories: The ever-evolving triage agent enhancing efficiency each shift.

Safeguarding Sensitive Data in the Era of Public AI Platforms

AI Agents Revolutionizing Identity Attacks: Same Tactics, Enhanced Speed