Truth in IT
    • Sign In
    • Register
        • Videos
        • Channels
        • Pages
        • Galleries
        • News
        • Events
        • All
Truth in IT Truth in IT
  • Data Management ▼
    • Converged Infrastructure
    • DevOps
    • Networking
    • Storage
    • Virtualization
  • Cybersecurity ▼
    • Application Security
    • Backup & Recovery
    • Data Security
    • Identity & Access Management (IAM)
    • Zero Trust
    • Compliance & GRC
    • Endpoint Security
  • Cloud ▼
    • Hybrid Cloud
    • Private Cloud
    • Public Cloud
  • Webinar Library
  • TiPs
  • DRAW

DLP Building Blocks: Dictionaries, Engines & Policy

Zscaler
05/11/2026
0 (0%)
Share
  • Comments
  • Download
  • Transcript
Report Like Favorite
  • Share/Embed
  • Email
Link
Embed

Transcript


In this series of short videos, we're taking a look at the baseline recommendations for the configuration of a data protection policy inside ZIA. This is part 3, covering the building blocks of DLP policy. In this video, we'll be talking about the three components that make up the fundamental building blocks of DLP policy. The first of these are DLP dictionaries. These are the what. These describe content, which is the data you want to protect. The next component are DLP engines. These are the how. DLP engines combine one or more dictionaries into a set of matching criteria. Finally, DLP policy is the action taken, i.e. allow, block, or require user confirmation. These match engines to optional criteria to make policy decisions. Let's take a look at DLP dictionaries in more detail. Zetskiller's data protection policy comes with a number of predefined dictionaries. Dictionaries have a confidence score assigned. These are specific to each dictionary. In general terms, a low confidence score indicates that the dictionary is looking for a pattern of data. A medium confidence score indicates the dictionary is looking for a popular format, for example, credit cards, social security numbers, passport numbers, etc., whereas a high confidence score indicates the presence of high confidence phrases with proximity. An example here, we have text that says, Visa card used to pay for this order and a credit card number. The phrase here is Visa. The pattern is the card number and a proximity length indicates how far from each other these terms and patterns should be. Additionally, Zetskiller also supports creating custom dictionaries. These allow you to define patterns and phrases of your own choosing as well as select advanced classification types such as exact data matching, index document matching, or Microsoft MIP labels. Select your match type, matching any patterns and any phrases with or without proximity, matching any or matching all, as well as actions, i.e. count all or count unique. Next are DLP engines. DLP engines collect one or more dictionaries along with a logical operator and a match count. For example, dictionaries could look for social security numbers and credit cards or social security numbers or credit cards. You also have the possibility of configuring excludes. This should be used with caution as any match excludes the entire data set. It's also possible to configure sub-expressions, distinguishing, for example, between a bank reading number and a credit card or financial statement or a bank reading number and credit card or a financial statement. Let's take a look at a quick example here. We have two dictionaries, one that looks for the word data and one that looks for the word personal, an engine that says you must have at least one match from dictionary one and at least one match from dictionary two, and a data set that contains test data, work data, bad data, personal data, sensitive data, generic data. In this case, since we're looking for the combination of data from dictionary one and personal from dictionary two, the only match is personal data, which combines both one and two. With the DLP policy configured to block anything that matches this engine, we've now got a file that's been blocked because it contains that combination of data. A quick note to add that if you have Microsoft Preview Information Protection labels or MIP labels configured inside your O365 tenant, it's possible to connect your Microsoft admin account with the ZI data protection engine in order to import these MIP labels and then create a dictionary that matches on MIP labels. This means if you've already done the work of classifying your data inside your M365 tenant, you don't need to do this work again inside ZScaler to benefit from these protections. Moving on, let's talk about exact data match or EDM. EDM takes data from a CSV file. Here we have an example where we've got a name, SSN, street address, city, and zip code. This data is then fed to an index VM, which is hosted by you and the contents of which ZScaler never sees. This is to guarantee the safety and security of your personal data. This generates an index template. No data is ever shared with ZScaler, only hashes to allow the cloud to identify that data inside documents. This can then be leveraged inside engines in order to create policy that targets exact data matches. Index templates generally require a primary key, at least one, no more than two. This is a unique field that will be matched to the data. For example, in this hypothetical CSV that we're indexing, you'd pick the SSN as the primary key. You can also define secondary keys. And for the DLP advanced license, there's also the option to do EDM with no primary key. There's a few considerations here. The primary key should be as unique as possible across all EDM templates. So if you have multiple templates, you should make sure that the primary key is not duplicated between templates. Unless using EDM with the primary key, obviously the primary key cannot be blank. Secondary keys can be blank, however. Additionally, note that special and non-ASCII characters are ignored. There is a field size limit. EDM will ignore any input that is shorter than three characters. And most importantly, you shouldn't use a primary key that is also a secondary key in another template or vice versa. This is to make sure that the correct template is applied to your data type. Index document matching works with a similar concept, only taking documents in specific formats to build the index templates. This can index either specific files and detect an exact match to that file or empty forms, detecting partial matches from a filled form and excluding 100% matches, which would only be empty forms. Note that there is a max file size for index document matching. Finally, Zscaler DLP policy supports optical character recognition, using advanced ML-AI to extract text data from an image. To round this video out, here are several recommendations for your next steps in your data protection journey. First, you'll want to enable optical character recognition to make sure that data and images is evaluated for your DLP policy. Secondly, you should deploy the Zscaler index tool, if you're licensed for it, in order to be able to perform exact data matching and index document matching. Finally, index templates should be created for your structured and unstructured data for both EDM and index documents. This will allow you to leverage these templates inside your DLP engines and your DLP policy. That's it for this video. Thank you for watching.

TL;DR

  • DLP dictionaries define what data to protect using patterns and phrases with confidence scores indicating detection accuracy from low (pattern-based) to high (phrase-proximity combinations)
  • DLP engines combine dictionaries with logical operators and match counts to create detection criteria, supporting advanced features like exact data matching and Microsoft MIP label integration
  • DLP policy connects engines to actions (allow, block, confirm) and enables sophisticated data protection through optical character recognition and index-based matching for structured and unstructured data

Summary

This technical tutorial explains the three fundamental components that form the foundation of data loss prevention policy in Zscaler Internet Access. DLP dictionaries define the content to protect through pattern matching and confidence scoring, ranging from low confidence pattern detection to high confidence phrase-proximity combinations. DLP engines combine multiple dictionaries with logical operators and match counts to create sophisticated detection criteria, supporting advanced features like sub-expressions and exclusions. Finally, DLP policy ties engines to actions—allow, block, or user confirmation—enabling granular control over data protection decisions. The video demonstrates how these building blocks work together through practical examples, including integration with Microsoft Purview Information Protection labels, exact data matching using CSV-based indexing, and index document matching for form detection. Organizations already using MIP labels in Microsoft 365 can import these classifications directly into Zscaler, eliminating duplicate effort while maintaining data security through hash-based matching that never exposes sensitive data to the cloud platform.

Chapters

0:00 - Introduction to DLP Building Blocks
0:47 - DLP Dictionaries and Confidence Scores
1:57 - DLP Engines and Logical Operators
3:05 - Microsoft MIP Label Integration
3:30 - Exact Data Matching (EDM)
5:00 - Index Document Matching and OCR
5:31 - Implementation Recommendations

Key Quotes

0:20 "The first of these are DLP dictionaries. These are the what. These describe content, which is the data you want to protect."
3:24 "This means if you've already done the work of classifying your data inside your M365 tenant, you don't need to do this work again inside Zscaler to benefit from these protections."
3:47 "This is to guarantee the safety and security of your personal data. This generates an index template. No data is ever shared with Zscaler, only hashes to allow the cloud to identify that data inside documents."

Categories:
  • » Webinar Library » Zscaler
  • » Data Protection » Backup & Recovery
  • » Cybersecurity » Data Security
  • » Cybersecurity » Cloud Security
  • » Data Protection
Channels:
News:
Events:
Tags:
  • Data Protection
  • Cloud Security
  • Technical Deep Dive
  • How-To
  • Compliance & Governance
  • Data Loss Prevention
  • DLP Policy Configuration
  • Data Classification
  • Exact Data Matching
  • Microsoft Purview Integration
  • Index Document Matching
  • Optical Character Recognition
Show more Show less

Browse videos

  • Related
  • Featured
  • By date
  • Most viewed
  • Top rated
  •  

              Video's comments: DLP Building Blocks: Dictionaries, Engines & Policy

              Upcoming Webinar Calendar

              • 06/30/2026
                01:00 PM
                06/30/2026
                Mastering Active Directory Certificate Services for Long-Term Success
                https://www.truthinit.com/index.php/channel/2018/mastering-active-directory-certificate-services-for-long-term-success/
              • 07/01/2026
                04:00 AM
                07/01/2026
                Integrating Security in AI: Automated Red Teaming Strategies for Private Models
                https://www.truthinit.com/index.php/channel/1969/integrating-security-in-ai-automated-red-teaming-strategies-for-private-models/
              • 07/01/2026
                04:00 AM
                07/01/2026
                Schutz von KI in Anwendungen, Agenten und APIs.
                https://www.truthinit.com/index.php/channel/2008/schutz-von-ki-in-anwendungen-agenten-und-apis/
              • 07/01/2026
                01:00 PM
                07/01/2026
                Preventing Your AI from Turning Against You: Essential Strategies
                https://www.truthinit.com/index.php/channel/2021/preventing-your-ai-from-turning-against-you-essential-strategies/
              • 07/02/2026
                10:00 AM
                07/02/2026
                Resilience Insights from Hybrid Threats Amidst Cloud Challenges
                https://www.truthinit.com/index.php/channel/2011/resilience-insights-from-hybrid-threats-amidst-cloud-challenges/
              • 07/09/2026
                01:00 PM
                07/09/2026
                The HUMAN Experience: Manifesting Agentic Trust in Real Life
                https://www.truthinit.com/index.php/channel/2026/the-human-experience-manifesting-agentic-trust-in-real-life/
              • 07/14/2026
                01:00 PM
                07/14/2026
                Crafting a Championship-Quality Security Team for Unmatched Defense
                https://www.truthinit.com/index.php/channel/2025/crafting-a-championship-quality-security-team-for-unmatched-defense/
              • 07/21/2026
                04:00 AM
                07/21/2026
                Strategies for Managing AI Governance and Securing App-to-LLM API Traffic
                https://www.truthinit.com/index.php/channel/1967/strategies-for-managing-ai-governance-and-securing-app-to-llm-api-traffic/
              • 07/21/2026
                01:00 PM
                07/21/2026
                HUMAN Dialogue: Insights from Attackers During the FIFA World Cup
                https://www.truthinit.com/index.php/channel/2029/human-dialogue-insights-from-attackers-during-the-fifa-world-cup/
              • 07/22/2026
                06:30 AM
                07/22/2026
                Insights and Strategies from the DPDP Webinar
                https://www.truthinit.com/index.php/channel/2000/insights-and-strategies-from-the-dpdp-webinar/
              • 07/28/2026
                01:00 PM
                07/28/2026
                Illumio + Netskope: Zero Trust in the Age of AI Autonomy
                https://www.truthinit.com/index.php/channel/2031/illumio-netskope-zero-trust-in-the-age-of-ai-autonomy/
              • 07/29/2026
                04:00 AM
                07/29/2026
                Real-Time Strategies for Safeguarding Against Prompt Injections
                https://www.truthinit.com/index.php/channel/1968/real-time-strategies-for-safeguarding-against-prompt-injections/
              • 08/19/2026
                12:00 PM
                08/19/2026
                Witness Cyera Agent Security in Action: A Firsthand Experience
                https://www.truthinit.com/index.php/channel/2036/witness-cyera-agent-security-in-action-a-firsthand-experience/
              • 09/30/2026
                04:00 AM
                09/30/2026
                AI Command Center: Optimizing Visibility and Control in Your Operations
                https://www.truthinit.com/index.php/channel/2024/ai-command-center-optimizing-visibility-and-control-in-your-operations/

              Upcoming Events

              • Jun
                30

                Mastering Active Directory Certificate Services for Long-Term Success

                06/30/202601:00 PM ET
                • Jul
                  01

                  Integrating Security in AI: Automated Red Teaming Strategies for Private Models

                  07/01/202604:00 AM ET
                  • Jul
                    01

                    Schutz von KI in Anwendungen, Agenten und APIs.

                    07/01/202604:00 AM ET
                    • Jul
                      01

                      Preventing Your AI from Turning Against You: Essential Strategies

                      07/01/202601:00 PM ET
                      • Jul
                        02

                        Resilience Insights from Hybrid Threats Amidst Cloud Challenges

                        07/02/202610:00 AM ET
                        More events
                        Truth in IT
                        • Sponsor
                        • About Us
                        • Terms of Service
                        • Privacy Policy
                        • Contact Us
                        • Preference Management
                        Desktop version
                        Standard version