Know Your Data in the Microsoft 365 Compliance Center
Know Data to Protect Data
On May 14, Microsoft announced that discovery and review capabilities for labeled data and sensitive data types were generally available in the Microsoft 365 Compliance Center. Microsoft calls this “know your data,” part of their Information Protection and Governance framework. The idea is that if you understand your data, you can better protect what’s important. Or in marketing terms, “The first step in the journey to protect and govern your data is getting a holistic understanding of the sensitive data in your digital estate.” Not knowing that my tenant is a digital estate, I prefer my definition.
Maturing Microsoft 365 Compliance Features
IT systems tend to mature over time. The initial implementation is often rudimentary and requires a lot of manual processing before automation and insight is introduced. In the case of Microsoft 365 compliance, the journey started for Office 365 tenants about five years ago. In that time, we’ve seen major components become available to help companies manage important data stored in Office 365, including:
- Retention labels and policies.
- Sensitivity labels and policies.
- Data loss prevention (DLP) policies.
- Communications compliance policies.
- Office 365 audit log (unified auditing).
Some aspects of these components need Office 365 E5 or Microsoft 365 Compliance E5 licenses, but the basics of retention and sensitivity labels, DLP, and the audit log can be used with Office 365 E3.
Understanding What’s Happening
Tenants that have implemented some or all these technologies in the last few years probably have a lot of labeled material. Perhaps that material is all labeled perfectly, but it’s more likely that some information is overlooked, or mislabeled, or wasn’t considered in the original design. Apart from analyzing the application of labels through events in the Office 365 audit log (a messy process) or the basic Label Activity Explorer, up to now there hasn’t been a way to get a good overview of how a company’s data governance is working.
The data classification dashboard in the Microsoft 365 compliance center gives some useful statistics and insights to help compliance administrators figure out where things are working and where some tweaks are needed. Figure 1 shows the data from my (small) tenant. As always, the larger the tenant, the more data you have and the more useful these kinds of features are.
Major Sections of Data Classification Dashboard
The sections of the dashboard are:
- Overview: As Figure 1 shows, you can see what sensitivity and retention labels are in use and in what workload. The dashboard also highlights sensitive data types found in documents and messages.
- Trainable classifiers. This preview feature allows tenants to build their own sensitive data types by using AI training based on a set of examples. For example, if a specific business form is used to capture information, you can create a classifier based on the form and use it to apply retention or sensitivity labels or in communications compliance policies. Microsoft’s default classifiers include document types like resumes and source code and classifiers used to detect objectionable behavior like profanity and threat.
- Sensitive data types: Lists the sensitive data types known in the tenant, including the default set (100 or so) created by Microsoft and those created by the tenant through digital fingerprinting, dictionaries, or simple rule matching.
- Content explorer: Allows compliance administrators to see where retention and sensitivity labels are applied in Exchange Online, SharePoint Online, and OneDrive for Business.
- Activity explorer. The older Activity label explorer displays retention label activity. I could initially only see sensitivity label actions despite Microsoft’s blog post shows data for both sensitivity and retention labels. The solution is to select retention labels as a filter. Being forced to select retention labels like this seems a little odd as you’d imagine that both label types would be shown by default. The older explorer is available in the Office 365 security and compliance center.
The overview is available with an Office 365 E3 license. Office 365 E5 or the Microsoft 365 E5 compliance licenses are needed for content explorer, activity explorer, and trainable classifiers.
The value of the content explorer is that it exposes the usefulness and accuracy of labeling within a tenant. Clearly, there’s no point in defining sets of retention and sensitivity labels if they are not used. And when labels are used, you’d like to know that they are being used correctly to mark documents and email to be kept, removed, or protected. No one can doubt the goodness of a tool to help compliance administrators improve the effectiveness of data governance.
What some might choke on is that to improve label effectiveness, compliance administrators can view email and documents in the source locations if their account is assigned the right permissions. To use the content explorer, administrators need these permissions:
- Content Explorer List Viewer. Allows viewing locations (for instance, SharePoint sites) and list of items in those locations.
- Content Explorer Content Viewer. Allows viewing the source content for each item.
For example, in Figure 2 the sensitivity labels defined in the tenant are shown in the left-hand pane. The Confidential label is selected, and we’ve selected SharePoint Online as the location, so the content explorer shows the sites where the Confidential label is used.
Selecting a site reveals the set of documents with the assigned label. If your account has the Content Explorer Content Viewer permission, you can then view the document source (Figure 3).
Interestingly, even though SharePoint Online support for sensitivity labels is generally available since March 2020, the source view doesn’t work for documents assigned labels with protection. Apparently, this is by design to stop very sensitive documents being perused by people who shouldn’t be looking at them.
Access to Personal Information
For compliance administrators, content explorer is a great step forward. Being able to open and examine the source of a document or email assigned a retention or sensitivity label or one marked as containing a sensitive data type is an excellent way to confirm the accuracy of user or automatic labeling.
However, some will be nervous when they read that compliance administrators can access information like this, including to protected content. This ignores the simple fact that similar access is already available through content searches or eDiscovery cases. Nonetheless, people do worry about access to private information, so comprehensive oversight is needed before assigning anyone the content explorer permissions.