Google Adds Improvements to Data Loss Prevention API


security red hero img

In a recent blog post on the Google Cloud Platform blog, Google announced several improvements to their Data Loss Prevention API, a service that can be used to manage sensitive user data.

The Data Loss Prevention (DLP) API, which was released as a beta this past March, can be used to detect and secure a variety of different types of personal data, including names, credit card numbers, and social security numbers, among others. This is done by analyzing the data using more than 50 predefined detectors that look for patterns, formats, checksums, etc. in order to determine if it contains personally identifiable information.

The DLP API offers built-in support for Google’s Cloud Storage, BigQuery, and Cloud Datastore platforms. Those who use third-party data storage services can also check to see if their data contains any potentially sensitive information by simply feeding it into the Data Loss Prevention API.

Some of the features that were recently added to the DLP API include:

Redaction and Suppression – This can be used to remove entire values or records from user data, which can be useful when it comes to keeping data out of the hands of those who don’t require access to it. In the case of a redaction or suppression, identifying data is removed or “covered up” so that users can’t see what the underlying data actually is. For example, the phone number “555-555-5555” may become “***-***-****” when redacted or may not be included at all if suppressed.

Partial Masking – For situations when one would like to hide part of a user’s data, partial masking can be employed. Some examples of this may include showing only the last four digits of a credit card or social security number or only showing the first few digits of a telephone number in order to maintain privacy, while still allowing some of the data to be shown. This could be useful for having users confirm that previously entered data is still correct, etc.

Tokenization and Secure Hashing – In situations where it’s necessary to maintain context without showing actual user data, tokenization and hashing can be useful. With this feature, one can still see what the data should look like or where it should be, but will not see the actual data itself as the values are encrypted (using either format-preserving encryption or hashing) to ensure the security of the data. With this feature, tokens are key-based and can be configured as either reversible or non-reversible, based on the preference of the DLP API user.

Dynamic Data Masking – When it comes to masking sensitive data in real time, while also maintaining the original data, dynamic data masking can be useful. This sort of data masking makes sense for live chat or streaming scenarios, where users may be asked to provide personal details so that a customer service representative can access their accounts. Organizations can configure who can see what data, depending on their specific needs.

Google’s Data Loss Prevention API also includes several other methods for securing user data, including bucketing, K-anonymity, and L-diversity, with a full list of data protection techniques being available on the “Documentation” section of the Data Loss Prevention API’s website.

With more and more services moving to the cloud, it’s becoming more important than ever for organizations to maintain the security of user and customer data, especially when it comes to personally identifiable information and other sensitive data like credit card and social security numbers. With a service like Google’s Data Loss Prevention API, organizations can rest a bit easier knowing that sensitive data can be obscured or even hidden altogether, which is something that can be invaluable in today’s world.