Veeam: Backup & Recovery for PGVector Databases on OpenShift

Name: Veeam: Backup & Recovery for PGVector Databases on OpenShift
Uploaded: 2026-06-16T07:10:12-04:00
Duration: 3 min 11 s
Description: As organizations adopt and implement their own AI applications in production, data protection for those workloads becomes critical. Given that most AI applications are implemented on Kubernetes and require heterogeneous data backends, such as relationa...

Veeam

06/16/2026

0 (0%)

Report Like Favorite

Transcript

with Veeam Kasten. This environment is an OpenShift cluster running container and VM workloads. For this demo, the key workload is a cloud-native Postgres cluster with the PG Vector extension enabled. PG Vector is commonly used by AI applications to store embeddings for Retrieval Augmented Generation, or RAG. The database is running as a highly available Postgres cluster in its own namespace. Kasten is also installed on the cluster and has already discovered the applications running across OpenShift. Now we'll protect the PG Vector workload. In the Kasten UI, I'll open the policy for the PG Vector demo and run it on demand. I'll also set an expiration date so this recovery point does not live forever and consume unnecessary storage. This policy is doing more than a basic snapshot. It is configured to protect the PG Vector namespace, create a backup, and export that backup off the cluster to object storage. That gives us a real recovery point outside the local OpenShift environment. The important part is application consistency. This policy uses a Kasten blueprint to run a pre- and post-action around the backup. In this case, the backup is taken from the secondary Postgres replica, not the primary. The pre-action pauses the replica, Kasten captures the data, and the post-action resumes it. That helps avoid impact to the production writer while still giving us a clean backup of the database. Now the backup has completed successfully. Before we simulate a failure, let's confirm the database has data. Here we can see document embeddings stored in PG Vector. Next we'll simulate a bad event. This could be an accidental deletion, a rogue administrator, or even an AI-driven workflow that changes or deletes the wrong data. I'll drop the embeddings table and confirm the data is gone. At this point, the RAG application would lose access to the context stored in the vector database. Now we'll recover with Kasten. I'll go to the restore points for the PG Vector demo and select the most recent recovery point. For this demo, I'll use the local restore point, but the exported copy could also be used if the cluster copy was unavailable. During restore, we use the same blueprint to run a before action that deletes the cluster resource first. This is important because the operator will otherwise try to recreate pods and persistent volumes while Kasten is restoring them. Removing the cluster resource, or CR, first allows Kasten to perform a clean recovery. Kasten restores the Kubernetes metadata and persistent volumes, then the Postgres cluster comes back online. Finally, we'll verify the result. The restore is completed successfully, the Postgres cluster is running again, and the PG Vector embeddings are back. This is how Veeam Kasten can protect and recover PG Vector databases running on OpenShift AI, including application-consistent backup, off-cluster export, and clean recovery for AI and RAG workloads.

TL;DR

Veeam Kasten provides application-aware backup for PGVector databases on OpenShift, using blueprints to orchestrate pre/post-actions that ensure consistency without impacting production workloads.
The solution backs up from secondary Postgres replicas rather than primary instances, pausing replication during snapshot capture to maintain data integrity for AI RAG applications.
Recovery includes automated handling of Kubernetes operator conflicts by removing cluster resources before restoration, enabling clean recovery of both metadata and persistent volumes for AI workloads.

Summary

This technical demonstration shows how Veeam Kasten protects PGVector databases running on Red Hat OpenShift for AI Retrieval Augmented Generation (RAG) workloads. The demo walks through creating an application-consistent backup of a cloud-native Postgres cluster with the PGVector extension enabled, which stores embeddings critical to AI applications. Kasten's approach uses blueprints to orchestrate pre- and post-backup actions that pause the secondary Postgres replica during backup to avoid production impact while ensuring data consistency. The demonstration includes simulating a data loss event by dropping the embeddings table, then performing a complete recovery using Kasten's restore capabilities. The solution exports backups off-cluster to object storage for true disaster recovery protection, and handles the complexity of Kubernetes operator-managed workloads by removing cluster resources before restoration to prevent conflicts during recovery.

Chapters

0:00 - Environment Overview
0:42 - Creating Application-Consistent Backup
1:35 - Verifying Data and Simulating Failure
2:04 - Restoring PGVector Database

Key Quotes

1:10 "The important part is application consistency."
1:18 "In this case, the backup is taken from the secondary Postgres replica, not the primary."
1:29 "That helps avoid impact to the production writer while still giving us a clean backup of the database."

FAQ

Why does Kasten back up from the secondary Postgres replica instead of the primary?

Backing up from the secondary replica allows Kasten to pause replication and capture a consistent snapshot without impacting the primary database that is actively serving production AI application writes. This approach maintains application performance while ensuring backup consistency.

What happens if the local OpenShift cluster is completely unavailable during a disaster?

Kasten exports backups off-cluster to object storage, creating a true recovery point outside the local OpenShift environment. This exported copy can be used for disaster recovery even if the entire cluster is lost, providing protection beyond local snapshots.

Categories:

Tags:

Show more Show less

TL;DR

Summary

Chapters

Key Quotes

FAQ

Discover DLP Memories: The ever-evolving triage agent enhancing efficiency each shift.

Safeguarding Sensitive Data in the Era of Public AI Platforms

Same Tactics, Enhanced Speed: AI Agents’ Impact on Identity Attacks