How does Zscaler DSPM detect sensitive data exposure in AI applications?

DSPM maps the access paths between AI models and their data sources, identifying when knowledge bases or training data contain sensitive information classified under regulations like GLBA. The platform surfaces this exposure in a dashboard organized by model and data classification, allowing security teams to trace exactly how an AI model gained access to specific sensitive documents.

What remediation options does DSPM provide when sensitive data is exposed to AI models?

DSPM can trigger proactive alerts when sensitive data exposure is detected, with integration options including Jira tickets, ServiceNow tickets, and custom automation workflows. The alerts include threat descriptions explaining potential attack scenarios, helping teams prioritize and respond appropriately based on organizational policies.

Securing AI Data Access in AWS Bedrock with Zscaler DSPM

Name: Securing AI Data Access in AWS Bedrock with Zscaler DSPM
Uploaded: 2026-03-20T19:32:40-04:00
Duration: 7 min 23 s
Description: TL;DR RAG-based AI applications can inadvertently expose sensitive data when knowledge bases include documents beyond their intended scope, allowing prompt injection attacks to extract confidential information. Zscaler DSPM maps the complete access pat...

Zscaler

03/20/2026

0 (0%)

Report Like Favorite

Transcript

My name is Max and I'm one of the Zscaler DSPM, Data Security Posture Management product specialists. Today, I would like to walk you through a use case where we're going to build a chatbot application using generative AI, LLM, large language model to enable our customers to ask questions about our products and receive an answer in a friendly way. What you can see here is the web interface of our chatbot. For the purpose of this demonstration, we created a fake company called SafeMarch Home Appliances, and as the name suggests, it manufactures home appliances. The challenge with modern home appliances is that they became so sophisticated that they come with a very long user manual, and probably most customers wouldn't be too happy reading the manual from A to Z. Therefore, we enable them to ask questions using this chatbot. Behind this chatbot, we have a large language model from Anthropic. The challenge with most large language models is that while they were trained on a large amount of data, usually they lack domain-specific knowledge. Probably, Claude is not aware of our user manual. Somehow, we need to make sure that it becomes aware of it, and there are several options to achieve this. One option is to retrain the model and include the user manual in the training set. The problem with this approach is that it is very expensive and it requires expertise. The other approach, which is cheaper and probably much better for us, is to use REG, Retrieval Augmented Generation. Before we start using this chatbot, let's take a look at the architecture. What you can see here is a diagram depicting the flow of a user submitting a prompt and then how they get a response. The user submits a prompt or a query, and this prompt is being sent to AWS Bedrock, a managed service from AWS. Before sending this prompt to the actual model, Bedrock performs a search to identify whether there is any documentation that might be relevant to this prompt. We can see here that we have the user manual stored in an S3 bucket. When Bedrock determines that this documentation is relevant, it augments the original prompt with the search results, so the model receives both the relevant documentation and the original prompt, and then it can provide a better response. Now that we know how it is built, let's see how it works. Let's ask the chatbot a question. Now it's going to process our prompt, and let's give it a few seconds, and hopefully we'll get the response shortly. You can see here that we get a very detailed response, including technical specifications and other information about the SafeMarch house cleaning robot. And you can also see where this information is coming from. It is coming from the user manual. So far, so good. Now let's see if we can actually trick this chatbot to provide information it's not supposed to. Ideally, this kind of a question should be rejected. Let's see what response we would get. So you can see here that it actually exposes information it's not supposed to. Specifically, it tells us that there is a top-secret document that outlines plans to acquire Acme Tech for 100 million US dollars by March 2026. And it claims that this information came from the user manual, which is obviously incorrect, because the user manual does not contain this kind of sensitive information. Now let's take a look at Zscaler DSPM and try to understand why did it happen. So what you can see here is the DSPM dashboard. And the part we are interested in at the moment is this one, sensitive data exposed to AI by models. You can see the different models used in this environment. And we can see here that we have Anthropic, and Anthropic has access to sensitive data under GLBA. Let's drill down and investigate. Here you can see that we have a SafeMarch user manual knowledge base. A knowledge base is a functionality from AWS Bedrock, their implementation of RAG, Retrieval Augmented Generation. So if we click on it, here we can see the actual access path. So we can see here that we have the AWS Bedrock knowledge base. And we can see that it uses one model. We can drill into the model and see that it uses cloud-free SONET. And if we click here on GLBA, we can actually see that financial statements were detected here. That's where the information about the acquisition plan is coming from. So here you can identify this and you can prevent this. Now in order to identify this, right now we had to go to the DSPM web interface and actually look at that. But you can also receive a proactive alert. So if we click on alerts here, and then we go, for instance, to this alert, AWS Bedrock knowledge base contains sensitive S3 data. You can see here that it shows us the actual problem. And it describes a possible threat that, for instance, a malicious user could submit a harmful prompt or query, enabling them to extract sensitive data from the model if it is not properly secured. And that's exactly what happened. Now we can see here that we have the threat. If we click on the sensitive data tab, we would actually see the specific sensitive data. And you can see here that we have this document board resolutions that contains this sensitive information. So that's how DSPM can help you to identify this kind of things. We can trigger an alert. We can create a Jira ticket, a ServiceNow ticket, maybe trigger some other automation, whatever is appropriate in your organization. So this is just one part of our capabilities. In one of the next videos, we're going to discuss additional AI-related topics, for instance, which models are being used. Maybe you have a policy about specific models which are allowed or not allowed in your organization. We're also going to dive deeper into shadow AI because it's one thing to use a managed AI service from AWS like Bedrock or from Azure like Azure AI Foundry. It's a different thing to deploy a virtual machine with some AI software running on it. So we are going to show you how DSPM can help here as well. I hope that you found this useful. Thank you very much for listening.

TL;DR

RAG-based AI applications can inadvertently expose sensitive data when knowledge bases include documents beyond their intended scope, allowing prompt injection attacks to extract confidential information.
Zscaler DSPM maps the complete access path from AI models to underlying data stores, identifying which sensitive data classifications each model can reach.
Proactive alerting notifies security teams when AWS Bedrock knowledge bases contain sensitive S3 data, with integration options for ticketing and automation workflows.
Future DSPM capabilities will address shadow AI detection for self-hosted AI workloads running outside managed cloud services like Bedrock or Azure AI Foundry.

RAG Architecture and Data Exposure Risks

This demonstration walks through building a customer-facing chatbot using AWS Bedrock and Anthropic's Claude model with Retrieval Augmented Generation (RAG). The fictional SafeMarch Home Appliances company uses RAG to augment LLM responses with product documentation stored in S3 buckets, enabling the chatbot to answer domain-specific questions about home appliances. However, the demo reveals a critical security gap: when a knowledge base inadvertently includes sensitive documents alongside intended content, users can craft prompts that extract confidential information—in this case, board resolutions detailing a $100 million acquisition plan that the model incorrectly attributes to the user manual.

DSPM Detection and Remediation Workflow

Zscaler DSPM's AI-SPM capabilities provide visibility into which AI models can access sensitive data and through what paths. The dashboard surfaces exposure by model and data classification type, allowing security teams to trace the access path from AWS Bedrock knowledge bases through to specific S3 objects containing regulated data like GLBA-classified financial statements. Beyond reactive investigation, DSPM generates proactive alerts when knowledge bases contain sensitive data, describing potential threats and enabling automated remediation through Jira tickets, ServiceNow integration, or custom workflows. The presenter previews upcoming coverage for shadow AI scenarios where organizations deploy AI workloads outside managed services.

Chapters

0:00 - Introduction and Use Case Overview
1:23 - RAG Architecture Explained
2:39 - Chatbot Demo and Data Extraction
4:13 - DSPM Dashboard Investigation
5:34 - Alert Configuration and Remediation
6:34 - Preview of Shadow AI Capabilities

Key Quotes

3:47 "So you can see here that it actually exposes information it's not supposed to. Specifically, it tells us that there is a top-secret document that outlines plans to acquire Acme Tech for 100 million US dollars by March 2026."
5:54 "A malicious user could submit a harmful prompt or query, enabling them to extract sensitive data from the model if it is not properly secured. And that's exactly what happened."
6:55 "It's one thing to use a managed AI service from AWS like Bedrock or from Azure like Azure AI Foundry. It's a different thing to deploy a virtual machine with some AI software running on it."

Categories:

Tags:

Show more Show less

Browse videos

Upcoming Webinar Calendar

06/23/2026

01:00 PM

06/23/2026

The AI-Powered VMware Alternative

https://www.truthinit.com/index.php/channel/2009/the-ai-powered-vmware-alternative/
06/24/2026

11:00 AM

06/24/2026

LATAM: Accelerating Insights on AI Through an Engaging Webinar Series

https://www.truthinit.com/index.php/channel/2012/accelerating-insights-on-ai-through-an-engaging-webinar-series/
06/25/2026

01:00 PM

06/25/2026

Generative AI Security: Preventing AI from Becoming a Data Breach Multiplier

https://www.truthinit.com/index.php/channel/1998/generative-ai-security-preventing-ai-from-becoming-a-data-breach-multiplier/
06/30/2026

01:00 PM

06/30/2026

Mastering Active Directory Certificate Services for Long-Term Success

https://www.truthinit.com/index.php/channel/2018/mastering-active-directory-certificate-services-for-long-term-success/
07/01/2026

04:00 AM

07/01/2026

Integrating Security in AI: Automated Red Teaming Strategies for Private Models

https://www.truthinit.com/index.php/channel/1969/integrating-security-in-ai-automated-red-teaming-strategies-for-private-models/
07/01/2026

04:00 AM

07/01/2026

Schutz von KI in Anwendungen, Agenten und APIs.

https://www.truthinit.com/index.php/channel/2008/schutz-von-ki-in-anwendungen-agenten-und-apis/
07/01/2026

01:00 PM

07/01/2026

How to Prevent Your AI from Taking Control of You

https://www.truthinit.com/index.php/channel/2021/how-to-prevent-your-ai-from-taking-control-of-you/
07/02/2026

10:00 AM

07/02/2026

When the cloud goes dark: Resilience lessons from hybrid threats

https://www.truthinit.com/index.php/channel/2011/resilience-insights-from-hybrid-threats-when-the-cloud-faces-challenges/
07/07/2026

01:00 PM

07/07/2026

A Comprehensive Demonstration of DLP Solutions and Strategies

https://www.truthinit.com/index.php/channel/2030/a-comprehensive-demonstration-of-dlp-solutions-and-strategies/
07/09/2026

01:00 PM

07/09/2026

Agentic Trust in Practice: Enhancing the Human Experience

https://www.truthinit.com/index.php/channel/2026/agentic-trust-in-practice-enhancing-the-human-experience/
07/14/2026

11:00 AM

07/14/2026

Discover the Latest Innovations in Netwrix 1Secure During This Technical Session

https://www.truthinit.com/index.php/channel/2014/discover-the-latest-innovations-in-netwrix-1secure-during-this-technical-session/
07/21/2026

04:00 AM

07/21/2026

Strategies for Managing AI Governance and Securing App-to-LLM API Traffic

https://www.truthinit.com/index.php/channel/1967/strategies-for-managing-ai-governance-and-securing-app-to-llm-api-traffic/
07/21/2026

01:00 PM

07/21/2026

HUMAN Dialogue: Insights from Attackers Revealed at the FIFA World Cup

https://www.truthinit.com/index.php/channel/2029/human-dialogue-insights-from-attackers-revealed-at-the-fifa-world-cup/
07/22/2026

06:30 AM

07/22/2026

Understanding the Dynamics of Data Privacy and Protection Regulations

https://www.truthinit.com/index.php/channel/2000/understanding-the-dynamics-of-data-privacy-and-protection-regulations/
07/28/2026

01:00 PM

07/28/2026

Illumio: Zero Trust in the Age of AI Autonomy

https://www.truthinit.com/index.php/channel/2031/illumio-zero-trust-in-the-age-of-ai-autonomy/
07/29/2026

04:00 AM

07/29/2026

Real-Time Strategies for Safeguarding Against Prompt Injections

https://www.truthinit.com/index.php/channel/1968/real-time-strategies-for-safeguarding-against-prompt-injections/
09/30/2026

04:00 AM

09/30/2026

AI Command Center: Optimizing Visibility and Control in Your Operations

https://www.truthinit.com/index.php/channel/2024/ai-command-center-optimizing-visibility-and-control-in-your-operations/