Skip to content

Files

This branch is 8757 commits behind GoogleCloudPlatform/java-docs-samples:main.

dlp

Cloud Data Loss Prevention (DLP) API Samples

Open in Cloud Shell

The Data Loss Prevention API provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams.

Setup

  • A Google Cloud project with billing enabled
  • Enable the DLP API.
  • (Local testing) Create a service account and set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to the downloaded credentials file.
  • (Local testing) Set the DLP_DEID_WRAPPED_KEY environment variable to an AES-256 key encrypted ('wrapped') with a Cloud Key Management Service (KMS) key.
  • (Local testing) Set the DLP_DEID_KEY_NAME environment variable to the path-name of the Cloud KMS key you wrapped DLP_DEID_WRAPPED_KEY with.

Build

This project uses the Assembly Plugin to build an uber jar. Run:

   mvn clean package

Retrieve InfoTypes

An InfoType identifier represents an element of sensitive data.

Info types are updated periodically. Use the API to retrieve the most current info types for a given category. eg. HEALTH or GOVERNMENT.

  java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata -category GOVERNMENT

Retrieve Categories

Categories provide a way to easily access a group of related InfoTypes.

  java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata

Run the quickstart

The Quickstart demonstrates using the DLP API to identify an InfoType in a given string.

   java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.QuickStart

Inspect data for sensitive elements

Inspect strings, files locally and on Google Cloud Storage and Cloud Datastore kinds with the DLP API.

Note: image scanning is not currently supported on Google Cloud Storage. For more information, refer to the API documentation. Optional flags are explained in this resource.

Commands:
  -s <string>                   Inspect a string using the Data Loss Prevention API.
  -f <filepath>                 Inspects a local text, PNG, or JPEG file using the Data Loss Prevention API.
  -gcs -bucketName <bucketName> -fileName <fileName>  Inspects a text file stored on Google Cloud Storage using the Data Loss
                                          Prevention API.
  -ds -projectId [projectId] -namespace [namespace] - kind <kind> Inspect a Datastore instance using the Data Loss Prevention API.

Options:
  --help               Show help 
  -minLikelihood       [string] [choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
                       [default: "LIKELIHOOD_UNSPECIFIED"]
                       specifies the minimum reporting likelihood threshold.
  -f, --maxFindings    [number] [default: 0]
                       maximum number of results to retrieve
  -q, --includeQuote   [boolean] [default: true] include matching string in results
  -t, --infoTypes      restrict to limited set of infoTypes [ default: []]
                       [ eg. PHONE_NUMBER US_PASSPORT]

Examples

  • Inspect a string:
    java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is me@somedomain.com"
    
  • Inspect a local file (text / image):
      java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.txt
      java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.png
    
  • Inspect a file on Google Cloud Storage:
      java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt
    
  • Inspect a Google Cloud Datastore kind:
      java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind
    

Automatic redaction of sensitive data

Automatic redaction produces an output with sensitive data matches removed.

Commands:
  -s <string>                   Source input string
  -r <replacement string>       String to replace detected info types
 Options:
  --help               Show help
  -minLikelihood       choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
                       [default: "LIKELIHOOD_UNSPECIFIED"]
                       specifies the minimum reporting likelihood threshold.
  
  -infoTypes     restrict operation to limited set of info types [ default: []]
                      [ eg. PHONE_NUMBER US_PASSPORT]

Example

  • Replace sensitive data in text with _REDACTED_:
      java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -s "My phone number is (123) 456-7890 and my email address is me@somedomain.com" -r "_REDACTED_"
    

Integration tests

Setup

Run

Run all tests:

   mvn clean verify