What's the difference between DLP dictionaries, engines, and policies in Zscaler?

Dictionaries define what content to protect (patterns and phrases), engines combine dictionaries with logical operators to create matching criteria, and policies determine the action taken (allow, block, or require confirmation) when engines detect matches.

How does exact data matching protect sensitive information while using cloud-based DLP?

EDM uses a customer-hosted index VM to process CSV data and generate hash-based templates. Only the hashes are shared with Zscaler's cloud, never the actual sensitive data, allowing the platform to identify matches without ever seeing the original content.

DLP Building Blocks: Dictionaries, Engines & Policy

Name: DLP Building Blocks: Dictionaries, Engines & Policy
Uploaded: 2026-05-11T18:05:30-04:00
Duration: 6 min 5 s
Description: About Zscaler Zscaler (NASDAQ: ZS) accelerates digital transformation so that customers can be more agile, efficient, resilient, and secure. The Zscaler Zero Trust Exchange protects thousands of customers from cyberattacks and data loss by securely con...

Zscaler

05/11/2026

0 (0%)

Report Like Favorite

Transcript

In this series of short videos, we're taking a look at the baseline recommendations for the configuration of a data protection policy inside ZIA. This is part 3, covering the building blocks of DLP policy. In this video, we'll be talking about the three components that make up the fundamental building blocks of DLP policy. The first of these are DLP dictionaries. These are the what. These describe content, which is the data you want to protect. The next component are DLP engines. These are the how. DLP engines combine one or more dictionaries into a set of matching criteria. Finally, DLP policy is the action taken, i.e. allow, block, or require user confirmation. These match engines to optional criteria to make policy decisions. Let's take a look at DLP dictionaries in more detail. Zetskiller's data protection policy comes with a number of predefined dictionaries. Dictionaries have a confidence score assigned. These are specific to each dictionary. In general terms, a low confidence score indicates that the dictionary is looking for a pattern of data. A medium confidence score indicates the dictionary is looking for a popular format, for example, credit cards, social security numbers, passport numbers, etc., whereas a high confidence score indicates the presence of high confidence phrases with proximity. An example here, we have text that says, Visa card used to pay for this order and a credit card number. The phrase here is Visa. The pattern is the card number and a proximity length indicates how far from each other these terms and patterns should be. Additionally, Zetskiller also supports creating custom dictionaries. These allow you to define patterns and phrases of your own choosing as well as select advanced classification types such as exact data matching, index document matching, or Microsoft MIP labels. Select your match type, matching any patterns and any phrases with or without proximity, matching any or matching all, as well as actions, i.e. count all or count unique. Next are DLP engines. DLP engines collect one or more dictionaries along with a logical operator and a match count. For example, dictionaries could look for social security numbers and credit cards or social security numbers or credit cards. You also have the possibility of configuring excludes. This should be used with caution as any match excludes the entire data set. It's also possible to configure sub-expressions, distinguishing, for example, between a bank reading number and a credit card or financial statement or a bank reading number and credit card or a financial statement. Let's take a look at a quick example here. We have two dictionaries, one that looks for the word data and one that looks for the word personal, an engine that says you must have at least one match from dictionary one and at least one match from dictionary two, and a data set that contains test data, work data, bad data, personal data, sensitive data, generic data. In this case, since we're looking for the combination of data from dictionary one and personal from dictionary two, the only match is personal data, which combines both one and two. With the DLP policy configured to block anything that matches this engine, we've now got a file that's been blocked because it contains that combination of data. A quick note to add that if you have Microsoft Preview Information Protection labels or MIP labels configured inside your O365 tenant, it's possible to connect your Microsoft admin account with the ZI data protection engine in order to import these MIP labels and then create a dictionary that matches on MIP labels. This means if you've already done the work of classifying your data inside your M365 tenant, you don't need to do this work again inside ZScaler to benefit from these protections. Moving on, let's talk about exact data match or EDM. EDM takes data from a CSV file. Here we have an example where we've got a name, SSN, street address, city, and zip code. This data is then fed to an index VM, which is hosted by you and the contents of which ZScaler never sees. This is to guarantee the safety and security of your personal data. This generates an index template. No data is ever shared with ZScaler, only hashes to allow the cloud to identify that data inside documents. This can then be leveraged inside engines in order to create policy that targets exact data matches. Index templates generally require a primary key, at least one, no more than two. This is a unique field that will be matched to the data. For example, in this hypothetical CSV that we're indexing, you'd pick the SSN as the primary key. You can also define secondary keys. And for the DLP advanced license, there's also the option to do EDM with no primary key. There's a few considerations here. The primary key should be as unique as possible across all EDM templates. So if you have multiple templates, you should make sure that the primary key is not duplicated between templates. Unless using EDM with the primary key, obviously the primary key cannot be blank. Secondary keys can be blank, however. Additionally, note that special and non-ASCII characters are ignored. There is a field size limit. EDM will ignore any input that is shorter than three characters. And most importantly, you shouldn't use a primary key that is also a secondary key in another template or vice versa. This is to make sure that the correct template is applied to your data type. Index document matching works with a similar concept, only taking documents in specific formats to build the index templates. This can index either specific files and detect an exact match to that file or empty forms, detecting partial matches from a filled form and excluding 100% matches, which would only be empty forms. Note that there is a max file size for index document matching. Finally, Zscaler DLP policy supports optical character recognition, using advanced ML-AI to extract text data from an image. To round this video out, here are several recommendations for your next steps in your data protection journey. First, you'll want to enable optical character recognition to make sure that data and images is evaluated for your DLP policy. Secondly, you should deploy the Zscaler index tool, if you're licensed for it, in order to be able to perform exact data matching and index document matching. Finally, index templates should be created for your structured and unstructured data for both EDM and index documents. This will allow you to leverage these templates inside your DLP engines and your DLP policy. That's it for this video. Thank you for watching.

TL;DR

DLP dictionaries define what data to protect using patterns and phrases with confidence scores indicating detection accuracy from low (pattern-based) to high (phrase-proximity combinations)
DLP engines combine dictionaries with logical operators and match counts to create detection criteria, supporting advanced features like exact data matching and Microsoft MIP label integration
DLP policy connects engines to actions (allow, block, confirm) and enables sophisticated data protection through optical character recognition and index-based matching for structured and unstructured data

Summary

This technical tutorial explains the three fundamental components that form the foundation of data loss prevention policy in Zscaler Internet Access. DLP dictionaries define the content to protect through pattern matching and confidence scoring, ranging from low confidence pattern detection to high confidence phrase-proximity combinations. DLP engines combine multiple dictionaries with logical operators and match counts to create sophisticated detection criteria, supporting advanced features like sub-expressions and exclusions. Finally, DLP policy ties engines to actions—allow, block, or user confirmation—enabling granular control over data protection decisions. The video demonstrates how these building blocks work together through practical examples, including integration with Microsoft Purview Information Protection labels, exact data matching using CSV-based indexing, and index document matching for form detection. Organizations already using MIP labels in Microsoft 365 can import these classifications directly into Zscaler, eliminating duplicate effort while maintaining data security through hash-based matching that never exposes sensitive data to the cloud platform.

Chapters

0:00 - Introduction to DLP Building Blocks
0:47 - DLP Dictionaries and Confidence Scores
1:57 - DLP Engines and Logical Operators
3:05 - Microsoft MIP Label Integration
3:30 - Exact Data Matching (EDM)
5:00 - Index Document Matching and OCR
5:31 - Implementation Recommendations

Key Quotes

0:20 "The first of these are DLP dictionaries. These are the what. These describe content, which is the data you want to protect."
3:24 "This means if you've already done the work of classifying your data inside your M365 tenant, you don't need to do this work again inside Zscaler to benefit from these protections."
3:47 "This is to guarantee the safety and security of your personal data. This generates an index template. No data is ever shared with Zscaler, only hashes to allow the cloud to identify that data inside documents."

Categories:

Tags:

Show more Show less

Browse videos

Upcoming Webinar Calendar

06/30/2026

01:00 PM

06/30/2026

Mastering Active Directory Certificate Services for Long-Term Success

https://www.truthinit.com/index.php/channel/2018/mastering-active-directory-certificate-services-for-long-term-success/
07/01/2026

04:00 AM

07/01/2026

Integrating Security in AI: Automated Red Teaming Strategies for Private Models

https://www.truthinit.com/index.php/channel/1969/integrating-security-in-ai-automated-red-teaming-strategies-for-private-models/
07/01/2026

04:00 AM

07/01/2026

Schutz von KI in Anwendungen, Agenten und APIs.

https://www.truthinit.com/index.php/channel/2008/schutz-von-ki-in-anwendungen-agenten-und-apis/
07/01/2026

01:00 PM

07/01/2026

Preventing Your AI from Turning Against You: Essential Strategies

https://www.truthinit.com/index.php/channel/2021/preventing-your-ai-from-turning-against-you-essential-strategies/
07/02/2026

10:00 AM

07/02/2026

Resilience Insights from Hybrid Threats Amidst Cloud Challenges

https://www.truthinit.com/index.php/channel/2011/resilience-insights-from-hybrid-threats-amidst-cloud-challenges/
07/09/2026

01:00 PM

07/09/2026

The HUMAN Experience: Manifesting Agentic Trust in Real Life

https://www.truthinit.com/index.php/channel/2026/the-human-experience-manifesting-agentic-trust-in-real-life/
07/14/2026

01:00 PM

07/14/2026

Crafting a Championship-Quality Security Team for Unmatched Defense

https://www.truthinit.com/index.php/channel/2025/crafting-a-championship-quality-security-team-for-unmatched-defense/
07/21/2026

04:00 AM

07/21/2026

Strategies for Managing AI Governance and Securing App-to-LLM API Traffic

https://www.truthinit.com/index.php/channel/1967/strategies-for-managing-ai-governance-and-securing-app-to-llm-api-traffic/
07/21/2026

01:00 PM

07/21/2026

HUMAN Dialogue: Insights from Attackers During the FIFA World Cup

https://www.truthinit.com/index.php/channel/2029/human-dialogue-insights-from-attackers-during-the-fifa-world-cup/
07/22/2026

06:30 AM

07/22/2026

Insights and Strategies from the DPDP Webinar

https://www.truthinit.com/index.php/channel/2000/insights-and-strategies-from-the-dpdp-webinar/
07/28/2026

01:00 PM

07/28/2026

Illumio + Netskope: Zero Trust in the Age of AI Autonomy

https://www.truthinit.com/index.php/channel/2031/illumio-netskope-zero-trust-in-the-age-of-ai-autonomy/
07/29/2026

04:00 AM

07/29/2026

Real-Time Strategies for Safeguarding Against Prompt Injections

https://www.truthinit.com/index.php/channel/1968/real-time-strategies-for-safeguarding-against-prompt-injections/
08/19/2026

12:00 PM

08/19/2026

Witness Cyera Agent Security in Action: A Firsthand Experience

https://www.truthinit.com/index.php/channel/2036/witness-cyera-agent-security-in-action-a-firsthand-experience/
09/30/2026

04:00 AM

09/30/2026

AI Command Center: Optimizing Visibility and Control in Your Operations

https://www.truthinit.com/index.php/channel/2024/ai-command-center-optimizing-visibility-and-control-in-your-operations/