What problem does OneAIOps solve for organizations managing cloud infrastructure?

OneAIOps addresses data sovereignty and vendor lock-in concerns by providing an open-source AI-driven observability framework that keeps sensitive operational data under organizational control. It combines OpenNebula's cloud management platform with Prometheus monitoring and machine learning algorithms to predict resource usage and optimize VM placement across hybrid environments, eliminating dependency on third-party observability providers while maintaining transparency through open-source code.

AI-Driven Observability in Cloud-Edge with OpenNebula

Name: AI-Driven Observability in Cloud-Edge with OpenNebula
Uploaded: 2026-03-28T13:25:16-04:00
Duration: 21 min 45 s
Description: TL;DR OpenNebula has developed OneAIOps, an experimental open-source framework that combines cloud infrastructure management with AI-driven observability using Prometheus, Grafana, and machine learning algorithms to provide intelligent workload forecas...

Open Nebula

03/28/2026

0 (0%)

Report Like Favorite

Transcript

Hello everyone, I'm Victor Palma, Cloud Engineer at OpenNebula. Thank you very much for being here and joining us. It's really a pleasure to be at this year's DevBox Pro edition sharing this presentation with all of you. This session we introduce a new experimental AIOps framework developed in OpenNebula and it's Prometheus integration for the evaluation of AI algorithms in order to provide intelligent workload forecasting and infrastructure orchestrations. Capabilities to automate and optimize the provisioning and deployment of edge cloud nodes. So let's start with the presentation. So first let me introduce myself again. I'm Victor Palma, Cloud Engineer at OpenNebula. I come from Madrid and I've been working for OpenNebula for almost three years, developing and innovating in the cloud edge world. Well, let's move on from the introduction and let's start with some context. So first, what is observability? Well, I guess it's not a strange concept for most of you. It's nothing more than the ability to understand and analyze the inner workings of a system by collecting and analyzing relevant information. Or in other words, it's just transforming data input into information, into something that can be useful for us. So we can have a lot of numbers or data, but if we are not able to give them a meaning, they are of no use to us. So once we have the information, understanding and analyzing, it's fundamental. This is observability. Thanks to observability, we can detect anomalies, patterns or potential risks. We can identify performance improvements or observability can help us in decision making. As the saying goes, information is power. So we need to add the observability issue to our infrastructure. So let's talk now about AI. In the end, that is what we want to address in this session. Is AI useful for observability without a lab? Yes, totally. It's true that nowadays we want to use AI in everything. Well, you know, marketing guys are very responsible for that. But observability is a totally natural process for AI. So thanks to AI and data processing algorithm, we get many advantages such as enhancing data analysis, automated anomaly detection, dynamic scaling or predictive analysis. It opens up a world of possibilities. However, while this all sounds good, there are a number of challenges and concerns that we need to be aware of. First, there are concerns with third party solutions. So many organizations currently entrust sensitive data to external observability providers, raising concerns about data ownership and privacy. The information and the data is on the provider servers. So this in certain situations can be a risk. So I propose a solution following an open source philosophy. So providing transparency, allowing organizations to scrutinize code, address customization needs and maintain control over the data. Thanks to open source, we can have the control of the code and the data. So that allows us to avoid the vendor blocking risk. Vendor blocking is just the luck that we have when we depend on a certain provider. So organizations might find themselves tied to a specific vendor, limiting flexibility and potentially increasing costs. It's not easy migrating between providers. For example, if you want to move all your workload from AWS to cloud or to another cloud provider, each provider has different data models and APIs. So it's a very difficult thing. So now what? How can we address the challenges we have outlined? The solution for this is the OneAI Ops Framework, the open source solution for AI-driven observability. The OneAI Framework combines OpenNebula, a cloud infrastructure management platform, Prometheus and Grafana for infrastructure information monitoring and visualization tools, and a set of machine learning and artificial intelligence algorithms to provide these AI-driven observability frameworks. So let's take a step-by-step approach. First, what is OpenNebula? OpenNebula is a simple open source solution to build and manage enterprise clouds that combines existing virtualization technologies with advanced features for multi-tenancy, automatic provision and elasticity in order to offer on-demand virtualized services and applications. So OpenNebula uses the concept of hybrid cloud, that means the combination of on-premise data centers, public cloud and even nodes on the edge, in order to operate with the different nodes that we have. So all of them manage it from the same platform, that is OpenNebula. Here we can see some of the possibilities we have with OpenNebula. Thanks to OpenNebula, you can use the same interface to control all network and storage resources shared between different hypervisors such as VMware, KPM, Elixir or even Filecracker. OpenNebula has several integrations that facilitate the creation of automated workloads and applications such as Terraform, Kubernetes, Ansible, Docker or Houston APIs. OpenNebula also has its own web portal that we called Sunstone, with which you can interact with the OpenNebula core in a simple and convenient way. Other important features to highlight will be multi-tenancy, so different users can access to different resources, self-service, where every user can deploy a VM for itself, elasticity for multi-VM services, the possibility to create multi-tier apps, high ability to VMs and OpenNebula instances, the option to create a federated solution, the ability to provision resources on the edge automatically, multi-cloud management, and the possibility to combine VMs and container workloads within the same platform. Speaking of multi-cloud, as I said before, OpenNebula allows you to manage any infrastructure with automatic provision of resources from cloud providers such as Equinix, AWS or gcloud, all of them with an uniform management with the homogeneous layer for user and cloud administrators. Over this management platform, we can deploy any applications such as VMs, multi-VM services, containers, or even Kubernetes clusters on a shared environment, managing all of them from the same portal. Very handy. Now that we have seen what OpenNebula is, let's move on to the next important part of OneAI Ops, Prometheus. Prometheus is also an open-source platform used for event monitoring and alerting. It records metrics in a time series database with using HTTPI pool model, with flexible queries and real-time alerting. OpenNebula has a specific integration for Prometheus that allows to integrate the metrics of virtual machines and containers in Prometheus through its own exporters. For this, OpenNebula implements inside the host the nodes on exporter offered by Prometheus to export basic metrics from the database. For this, OpenNebula implements inside the host the nodes on exporter offered by Prometheus to export basic metrics of resource usage on the physical host. OpenNebula's own exporter to export metrics related to the virtual machines using the BitBeard library. And finally, OpenNebula's own monitoring system that exports the data to the OpenNebula monitor located in the front. This is to export all the collected data together with general usage information from the OpenNebula instance to Prometheus. This way, from Prometheus, we have a huge amount of information about the state of our cloud. But what can we do with all this information? So it's time to talk about the third and last part of one AI ops, which is obviously artificial intelligence. By adding AI to the formula, we can predict a lot of relevant information. We can predict through machine learning, the CPU, memory, and network traffic usage we may have in the future based on all past metrics. And then, thanks to the implementation of decision algorithms, we can allocate resources based on the current context of our cloud. Also using the prediction calculated through the machine learning algorithms as a guide for the deployment. So as a result, we obtain the architecture that you can see in this slide. One AI ops is based, as we have said before, on artificial intelligence algorithms implemented on the OpenNebula architecture using Prometheus for data extraction. So it's an architecture that is currently still under development, so it's still subject to changes. However, we can already see here some of the main components that we see before us. The current OpenNebula architecture with the monitoring system, the infrastructure manager, the physical infrastructure manager, all the VMs, containers, all the types of resources that we can manage at OpenNebula, like on-premise resources, public resources, or edge resources. On top of that, we have the new one AI ops architecture layer with a database with all the historical traces and logs from our own cloud. A prediction and anomaly detection module that is on chart of analyzing all the information that we have in the database. And some modules related to the elasticity managers, the virtual infrastructure orchestrator, and the physical infrastructure orchestrator. So thanks to these parts of the schema, we can allocate and optimize resources in our cloud. Of course, we also have a reporting and alert nodule in order to create alerts based on certain inputs. It's based on the Prometheus alert system, so it's more or less going to be the same for OpenNebula. So thanks to one AI ops and the architecture schema, we have CPU usage prediction for individual VM CPU operation per hour, the general CPU usage, and we also have accuracy. That means the accuracy between the last day we have usage and the prediction for the last day generated by the tool. We can also optimize with one AI ops, the VM allocation, consulting suggestion per each VM based on some algorithms, such blood balancing to balance the workload in our cloud, reducing the migrations in order to reduce, well, the migration in the optimization process, or the resource condition algorithm in order to join all the VMs in a few hosts and save resources, very useful for on-premise infrastructures. So here we can see the main dashboard in Grafana with all the metrics supported by Prometheus. We have a lot of interesting information such as graphs with the memory, CPU, and storage usage of the host, an overview of the status of our cloud with all the resources deployed, as well as the status of our build our machines and host. One AI ops also has other dashboards like the following one, where we can see all the results of the predictions made. We can consult the CPU usage prediction for each host, which allows us to get an idea about the use of our infra in the next hours. So, one AI ops also offers an average usage over our entire cloud, as well as an accuracy percentage that allow us to check the veracity of the prediction. So in this case, we can see that it's a very good value. And below we can see the migrations that the tool suggests for a core optimization algorithm in this case. This way we can take better advantage of our cloud resources, as I said before, very handy to on-premise infrastructures. So here, finally, we can see a prototype of the one AI ops implementation. We use the OpenNebula and Litvitex portals to provide general information about the status of the OpenNebula and VMs running on each host. Then we use the one AI ops component in order to optimize the placement of our VMs based on the usage on the data collected. And then once Prometheus has all the metrics collected, we can check all the information using the Grafana dashboards. So since it's a prototype, no automatic migration actions are performed here. This is only limited to suggestions for the cloud administrator. However, according to the results of our labs, the recommendations are quite good and help greatly to optimize the cloud. So what are the next steps? First of all, we will implement the PIO operations in order to apply the suggestion automatically. Since, as I said before, currently one AI ops only shows migration suggestions since it's still in development stage. Once this step is done, the next step will be to include AI ops as part of the OpenNebula software distribution, installing it within the same package. Finally, we also hope to expand the functionality to support anomaly detections, allocation based on memory prediction, allocation based on network traffic, and alerts and warnings based on these metrics and detections. One AI ops is an open source, so you can contribute to the project or check the source code in this repository that you can see here. And speaking of contributing, I will also like to mention the OpenNebula forum where you can collaborate, discuss and help other OpenNebula users to get the best out of their clouds. I really recommend it. You will also have the opportunity to learn a lot about the platform here, apart of course from the official OpenNebula documentation. So, as a closing slide, I would like to comment that this project is being funded by the European Union project named Cognit, a Cognitive Service Framework for the Cloud Edge Continuum. One AI ops together with OpenNebula will be used as fundamental pilars within this project for the deployment and optimization of resources in the Cloud Edge Continuum. So, well, that's all. I hope this presentation has been of interest to you. I hope to be able to share more news about one AI ops in the future, so stay tuned to the official OpenNebula information channels like the forum, Twitter, and so on for updates. So, thank you very much for your attention. I hope to see you the next time. So, I think that now we can let's move on to the Q&A section. So, in case you have any questions, we can see now. So, no questions. Okay. So, let's get started. Okay.

TL;DR

OpenNebula has developed OneAIOps, an experimental open-source framework that combines cloud infrastructure management with AI-driven observability using Prometheus, Grafana, and machine learning algorithms to provide intelligent workload forecasting and automated resource optimization.
The framework addresses data sovereignty concerns by keeping operational data under organizational control while avoiding vendor lock-in, supporting hybrid cloud environments that span on-premise data centers, public cloud providers, and edge nodes through a unified management interface.
OneAIOps uses machine learning to predict CPU, memory, and network usage patterns, then applies decision algorithms to optimize VM placement through strategies like load balancing, migration reduction, and resource consolidation, currently operating in advisory mode with strong prediction accuracy.
Development roadmap includes implementing automated operations, expanding prediction capabilities, adding anomaly detection, and integrating OneAIOps directly into OpenNebula's software distribution, with funding from the EU's Horizon Europe COGNIT project.
The framework is open source and available for community contribution, with OpenNebula encouraging participation through its GitHub repository and user forum to advance AI-driven infrastructure management while maintaining transparency and organizational control.

OneAIOps Framework Architecture and Integration

Victor Palma introduces OneAIOps, an experimental open-source framework developed by OpenNebula that combines cloud infrastructure management with AI-driven observability. The framework integrates OpenNebula's cloud management platform with Prometheus for metrics collection and Grafana for visualization, layering machine learning algorithms on top to enable intelligent workload forecasting and automated infrastructure orchestration. The architecture addresses data sovereignty concerns by keeping sensitive operational data under organizational control rather than with third-party providers. OpenNebula manages hybrid cloud environments spanning on-premise data centers, public cloud resources, and edge nodes through a unified interface, supporting multiple hypervisors including VMware, KVM, and LXC while providing multi-tenancy, self-service provisioning, and elasticity capabilities.

Predictive Analytics and Resource Optimization

The OneAIOps framework leverages machine learning algorithms to predict CPU, memory, and network traffic usage based on historical metrics collected through Prometheus exporters deployed across the infrastructure. These predictions feed into decision algorithms that optimize VM placement and resource allocation across the cloud environment. The system offers multiple optimization strategies including load balancing to distribute workloads evenly, migration reduction to minimize disruption, and resource consolidation to maximize efficiency in on-premise infrastructures. Grafana dashboards provide visibility into prediction accuracy, with the demonstration environment showing strong correlation between predicted and actual usage patterns. The framework currently operates in advisory mode, providing migration suggestions to cloud administrators rather than executing changes automatically.

Development Roadmap and EU Funding Context

OpenNebula plans to evolve OneAIOps from its current experimental state into a production-ready component integrated directly into the OpenNebula software distribution. Near-term development priorities include implementing automated operations to execute optimization suggestions without manual intervention, expanding prediction capabilities to cover memory and network traffic allocation, and adding anomaly detection with intelligent alerting. The project receives funding from the European Union's Horizon Europe program through the COGNIT initiative, which focuses on cognitive service frameworks for the cloud-edge continuum. OpenNebula encourages community participation through its open-source repository and active user forum, positioning OneAIOps as a collaborative effort to advance AI-driven infrastructure management while maintaining data sovereignty and avoiding vendor lock-in.

Chapters

0:00 - Introduction
1:14 - About Victor Palma
1:35 - Why Observability Matters
3:03 - AI for Observability
3:49 - Data Sovereignty Challenges
5:49 - OneAIOps Framework Overview
6:18 - What is OpenNebula?
8:36 - Multi-Cloud Management
9:20 - Prometheus Integration
11:06 - Adding AI to the Formula
11:47 - OneAIOps Architecture
13:45 - Features and Capabilities
16:48 - Example Environment Demo
17:25 - Next Steps and Challenges
18:26 - Community Forum
18:51 - COGNIT EU Funding

Key Quotes

1:52 "It's nothing more than the ability to understand and analyze the inner workings of a system by collecting and analyzing relevant information. Or in other words, it's just transforming data input into information, into something that can be useful for us."
4:03 "Many organizations currently entrust sensitive data to external observability providers, raising concerns about data ownership and privacy. The information and the data is on the provider servers. So this in certain situations can be a risk."
5:43 "The solution for this is the OneAI Ops Framework, the open source solution for AI-driven observability."
11:16 "We can predict through machine learning, the CPU, memory, and network traffic usage we may have in the future based on all past metrics. And then, thanks to the implementation of decision algorithms, we can allocate resources based on the current context of our cloud."
17:04 "Since it's a prototype, no automatic migration actions are performed here. This is only limited to suggestions for the cloud administrator. However, according to the results of our labs, the recommendations are quite good and help greatly to optimize the cloud."

Categories:

Tags:

Show more Show less

Browse videos

Upcoming Webinar Calendar

07/29/2026

04:00 AM

07/29/2026

Real-Time Strategies for Protecting Against Prompt Injections

https://www.truthinit.com/index.php/channel/1968/real-time-strategies-for-protecting-against-prompt-injections/
07/29/2026

01:00 PM

07/29/2026

Ask Your Cloud Anything: Unlocking Governance Silos in your Environments

https://www.truthinit.com/index.php/channel/2048/ask-your-cloud-anything-unlocking-governance-silos-in-your-environments/
08/03/2026

11:00 AM

08/03/2026

Discover DLP Memories: The ever-evolving triage agent enhancing efficiency each shift.

https://www.truthinit.com/index.php/channel/2062/discover-dlp-memories-the-ever-evolving-triage-agent-enhancing-efficiency-each-shift/
08/06/2026

04:00 AM

08/06/2026

Safeguarding Sensitive Data in the Era of AI Adoption

https://www.truthinit.com/index.php/channel/2058/safeguarding-sensitive-data-in-the-era-of-ai-adoption/
08/06/2026

02:00 PM

08/06/2026

Same Tactics, Enhanced Speed: AI Agents’ Impact on Identity Attacks

https://www.truthinit.com/index.php/channel/2064/same-tactics-enhanced-speed-ai-agents-impact-on-identity-attacks/
08/07/2026

11:30 AM

08/07/2026

Refreshing Drinks and Essential Cybersecurity Strategies for the Season

https://www.truthinit.com/index.php/channel/2063/refreshing-drinks-and-essential-cybersecurity-strategies-for-the-season/
08/13/2026

12:00 PM

08/13/2026

Harnessing AI for Secure Innovation in the Enterprise with Netskope & Omada

https://www.truthinit.com/index.php/channel/2065/harnessing-ai-for-secure-innovation-in-the-enterprise-with-netskope-omada/
08/19/2026

12:00 PM

08/19/2026

Becoming Agent Ready: Insights and Strategies with Cyera

https://www.truthinit.com/index.php/channel/2036/becoming-agent-ready-insights-and-strategies-with-cyera/
09/02/2026

12:00 PM

09/02/2026

Unified Data Security in Action: Uncover, Analyze, and Resolve Threats

https://www.truthinit.com/index.php/channel/2045/unified-data-security-in-action-uncover-analyze-and-resolve-threats/
09/30/2026

04:00 AM

09/30/2026

AI Command Center: Optimizing Visibility and Control in Your Operations

https://www.truthinit.com/index.php/channel/2024/ai-command-center-optimizing-visibility-and-control-in-your-operations/