Transcript
In this series of short videos, we're taking a look at the baseline recommendations for the configuration of a data protection policy inside ZIA. This is part 3, covering the building blocks of DLP policy. In this video, we'll be talking about the three components that make up the fundamental building blocks of DLP policy. The first of these are DLP dictionaries. These are the what. These describe content, which is the data you want to protect. The next component are DLP engines. These are the how. DLP engines combine one or more dictionaries into a set of matching criteria. Finally, DLP policy is the action taken, i.e. allow, block, or require user confirmation. These match engines to optional criteria to make policy decisions. Let's take a look at DLP dictionaries in more detail. Zetskiller's data protection policy comes with a number of predefined dictionaries. Dictionaries have a confidence score assigned. These are specific to each dictionary. In general terms, a low confidence score indicates that the dictionary is looking for a pattern of data. A medium confidence score indicates the dictionary is looking for a popular format, for example, credit cards, social security numbers, passport numbers, etc., whereas a high confidence score indicates the presence of high confidence phrases with proximity. An example here, we have text that says, Visa card used to pay for this order and a credit card number. The phrase here is Visa. The pattern is the card number and a proximity length indicates how far from each other these terms and patterns should be. Additionally, Zetskiller also supports creating custom dictionaries. These allow you to define patterns and phrases of your own choosing as well as select advanced classification types such as exact data matching, index document matching, or Microsoft MIP labels. Select your match type, matching any patterns and any phrases with or without proximity, matching any or matching all, as well as actions, i.e. count all or count unique. Next are DLP engines. DLP engines collect one or more dictionaries along with a logical operator and a match count. For example, dictionaries could look for social security numbers and credit cards or social security numbers or credit cards. You also have the possibility of configuring excludes. This should be used with caution as any match excludes the entire data set. It's also possible to configure sub-expressions, distinguishing, for example, between a bank reading number and a credit card or financial statement or a bank reading number and credit card or a financial statement. Let's take a look at a quick example here. We have two dictionaries, one that looks for the word data and one that looks for the word personal, an engine that says you must have at least one match from dictionary one and at least one match from dictionary two, and a data set that contains test data, work data, bad data, personal data, sensitive data, generic data. In this case, since we're looking for the combination of data from dictionary one and personal from dictionary two, the only match is personal data, which combines both one and two. With the DLP policy configured to block anything that matches this engine, we've now got a file that's been blocked because it contains that combination of data. A quick note to add that if you have Microsoft Preview Information Protection labels or MIP labels configured inside your O365 tenant, it's possible to connect your Microsoft admin account with the ZI data protection engine in order to import these MIP labels and then create a dictionary that matches on MIP labels. This means if you've already done the work of classifying your data inside your M365 tenant, you don't need to do this work again inside ZScaler to benefit from these protections. Moving on, let's talk about exact data match or EDM. EDM takes data from a CSV file. Here we have an example where we've got a name, SSN, street address, city, and zip code. This data is then fed to an index VM, which is hosted by you and the contents of which ZScaler never sees. This is to guarantee the safety and security of your personal data. This generates an index template. No data is ever shared with ZScaler, only hashes to allow the cloud to identify that data inside documents. This can then be leveraged inside engines in order to create policy that targets exact data matches. Index templates generally require a primary key, at least one, no more than two. This is a unique field that will be matched to the data. For example, in this hypothetical CSV that we're indexing, you'd pick the SSN as the primary key. You can also define secondary keys. And for the DLP advanced license, there's also the option to do EDM with no primary key. There's a few considerations here. The primary key should be as unique as possible across all EDM templates. So if you have multiple templates, you should make sure that the primary key is not duplicated between templates. Unless using EDM with the primary key, obviously the primary key cannot be blank. Secondary keys can be blank, however. Additionally, note that special and non-ASCII characters are ignored. There is a field size limit. EDM will ignore any input that is shorter than three characters. And most importantly, you shouldn't use a primary key that is also a secondary key in another template or vice versa. This is to make sure that the correct template is applied to your data type. Index document matching works with a similar concept, only taking documents in specific formats to build the index templates. This can index either specific files and detect an exact match to that file or empty forms, detecting partial matches from a filled form and excluding 100% matches, which would only be empty forms. Note that there is a max file size for index document matching. Finally, Zscaler DLP policy supports optical character recognition, using advanced ML-AI to extract text data from an image. To round this video out, here are several recommendations for your next steps in your data protection journey. First, you'll want to enable optical character recognition to make sure that data and images is evaluated for your DLP policy. Secondly, you should deploy the Zscaler index tool, if you're licensed for it, in order to be able to perform exact data matching and index document matching. Finally, index templates should be created for your structured and unstructured data for both EDM and index documents. This will allow you to leverage these templates inside your DLP engines and your DLP policy. That's it for this video. Thank you for watching.