The Data Loss Prevention API provides programmatic access to a powerful detection engine for personally identifiable information and other privacy-sensitive data in unstructured data streams.
- A Google Cloud project with billing enabled
- Enable the DLP API.
- (Local testing) Create a service account
and set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable pointing to the downloaded credentials file. - (Local testing) Set the
DLP_DEID_WRAPPED_KEY
environment variable to an AES-256 key encrypted ('wrapped') with a Cloud Key Management Service (KMS) key. - (Local testing) Set the
DLP_DEID_KEY_NAME
environment variable to the path-name of the Cloud KMS key you wrappedDLP_DEID_WRAPPED_KEY
with.
This project uses the Assembly Plugin to build an uber jar. Run:
mvn clean package
An InfoType identifier represents an element of sensitive data.
Info types are updated periodically. Use the API to retrieve the most current info types for a given category. eg. HEALTH or GOVERNMENT.
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata -category GOVERNMENT
Categories provide a way to easily access a group of related InfoTypes.
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata
The Quickstart demonstrates using the DLP API to identify an InfoType in a given string.
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.QuickStart
Inspect strings, files locally and on Google Cloud Storage and Cloud Datastore kinds with the DLP API.
Note: image scanning is not currently supported on Google Cloud Storage. For more information, refer to the API documentation. Optional flags are explained in this resource.
Commands:
-s <string> Inspect a string using the Data Loss Prevention API.
-f <filepath> Inspects a local text, PNG, or JPEG file using the Data Loss Prevention API.
-gcs -bucketName <bucketName> -fileName <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss
Prevention API.
-ds -projectId [projectId] -namespace [namespace] - kind <kind> Inspect a Datastore instance using the Data Loss Prevention API.
Options:
--help Show help
-minLikelihood [string] [choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-f, --maxFindings [number] [default: 0]
maximum number of results to retrieve
-q, --includeQuote [boolean] [default: true] include matching string in results
-t, --infoTypes restrict to limited set of infoTypes [ default: []]
[ eg. PHONE_NUMBER US_PASSPORT]
- Inspect a string:
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is me@somedomain.com"
- Inspect a local file (text / image):
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.txt java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.png
- Inspect a file on Google Cloud Storage:
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt
- Inspect a Google Cloud Datastore kind:
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind
Automatic redaction produces an output with sensitive data matches removed.
Commands:
-s <string> Source input string
-r <replacement string> String to replace detected info types
Options:
--help Show help
-minLikelihood choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-infoTypes restrict operation to limited set of info types [ default: []]
[ eg. PHONE_NUMBER US_PASSPORT]
- Replace sensitive data in text with
_REDACTED_
:java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -s "My phone number is (123) 456-7890 and my email address is me@somedomain.com" -r "_REDACTED_"
- Create a Google Cloud Storage bucket and upload test.txt.
- Create a Google Cloud Datastore kind and add an entity with properties:
property1
: john@doe.comproperty2
: 343-343-3435
- Update the Google Cloud Storage path and Datastore kind in InspectIT.java.
- Ensure that
GOOGLE_APPLICATION_CREDENTIALS
points to authorized service account credentials file.
Run all tests:
mvn clean verify