-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Package docs and example analysis updates * Version bump
- Loading branch information
Showing
11 changed files
with
124 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,44 @@ The following are not integrated into the service, but have PII type mapping and | |
<li>Azure PII Detection Cognitive Skill (Requires Azure Account) [<a href="https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-pii-detection">docs</a>]</li> | ||
</ol> | ||
|
||
In the case you are using the built-in Presidio functionality, you can call the analysis service as follows: | ||
For those using pre-detected results, adapters are provided to convert types and results to the expected DetectionResult/DetectionResultItem format (see diagram below): | ||
|
||
![Converting And Analyzing Existing Detections](UC1_Converting_Existing_Detections.png) | ||
|
||
To supply the analyzer module with a collection of pre-detected results from your own Microsoft Presidio, Azure, or AWS Comprehend analysis process, you will need to first convert the detection to a set of DetectionResult objects to feed into the analyzer as follows: | ||
|
||
```python | ||
from typing import List | ||
from pii_codex.models.common import ( | ||
AnalysisProviderType, | ||
) | ||
from presidio_analyzer import RecognizerResult | ||
from pii_codex.services.analysis_service import PIIAnalysisService | ||
from pii_codex.services.adapters.detection_adapters.presidio_detection_adapter import PresidioPIIDetectionAdapter | ||
from pii_codex.models.analysis import DetectionResult | ||
|
||
presidio_detection_service = PresidioPIIDetectionAdapter() | ||
|
||
list_of_detections: List[RecognizerResult] = [] # your list of detections | ||
converted_detections: List[DetectionResult] = presidio_detection_service.convert_analyzed_collection( | ||
pii_detections=list_of_detections | ||
) | ||
pii_analysis_service = PIIAnalysisService( | ||
analysis_provider=AnalysisProviderType.PRESIDIO.name | ||
) # If you don't intend to use presidio, override the analysis_provider value | ||
|
||
results = pii_analysis_service.analyze_detection_collection( | ||
detection_collection=converted_detections, | ||
collection_name="Data Set Label", # this is more for those that intend to find a way to label collections | ||
collection_type="SAMPLE" # defaults to POPULATION, input used for standard deviation and variance calculations | ||
) | ||
``` | ||
|
||
The other two detection adapters available are AWSComprehendPIIDetectionAdapter and AzurePIIDetectionAdapter. | ||
|
||
<hr> | ||
|
||
In the case you require the built-in Presidio functionality, you can call the analysis service as follows: | ||
|
||
```python | ||
from pii_codex.services.analysis_service import PIIAnalysisService | ||
|
@@ -24,6 +61,11 @@ results = pii_analysis_service.analyze_collection( | |
) | ||
``` | ||
|
||
This functionality can easily take a singular text item or a collection of them and runs through the presidio analysis and assessment service files as presented in the diagram below. | ||
|
||
![Converting And Analyzing Text with Presidio Builtin](UC2_Using_Presidio_Builtin_Service_for_Detection_and_Analysis.png) | ||
|
||
|
||
For those analyzing social media posts, you can also supply metadata per text sample to be analyzed in a dataframe. | ||
|
||
```python | ||
|
@@ -56,32 +98,6 @@ results = pii_analysis_service.analyze_collection( | |
) | ||
``` | ||
|
||
To supply the analyzer module with a collection of pre-detected results from your own Microsoft Presidio, Azure, or AWS Comprehend analysis process, you will need to first convert the detection to a set of DetectionResult objects to feed into the analyzer as follows: | ||
|
||
```python | ||
from typing import List | ||
from presidio_analyzer import RecognizerResult | ||
from pii_codex.services.analysis_service import PIIAnalysisService | ||
from pii_codex.services.adapters.detection_adapters.presidio_detection_adapter import PresidioPIIDetectionAdapter | ||
from pii_codex.models.analysis import DetectionResult | ||
|
||
presidio_detection_service = PresidioPIIDetectionAdapter() | ||
|
||
list_of_detections: List[RecognizerResult] = [] # your list of MSFT Presidio detections | ||
converted_detections: List[DetectionResult] = presidio_detection_service.convert_analyzed_collection( | ||
pii_detections=list_of_detections | ||
) | ||
pii_analysis_service = PIIAnalysisService() | ||
|
||
results = pii_analysis_service.analyze_detection_collection( | ||
detection_collection=converted_detections, | ||
collection_name="Data Set Label", # this is more for those that intend to find a way to label collections | ||
collection_type="SAMPLE" # defaults to POPULATION, input used for standard deviation and variance calculations | ||
) | ||
``` | ||
|
||
The other two detection services available are AWSComprehendPIIDetectionAdapter and AzurePIIDetectionAdapter. | ||
|
||
Sample output: | ||
|
||
``` | ||
|
@@ -93,7 +109,6 @@ Sample output: | |
"analysis": [ | ||
{ | ||
"pii_type_detected": "PERSON", | ||
"sanitized_text": "Hi! My name is <REDACTED>", | ||
"risk_level": 3, | ||
"risk_level_definition": "Identifiable", | ||
"cluster_membership_type": "Financial Information", | ||
|
@@ -108,6 +123,7 @@ Sample output: | |
], | ||
"index": 0, | ||
"risk_score_mean": 3, | ||
"sanitized_text": "Hi! My name is <REDACTED>", | ||
}, | ||
{ | ||
"analysis": [ | ||
|
@@ -153,6 +169,7 @@ Sample output: | |
], | ||
"index": 1, | ||
"risk_score_mean": 2.6666666666666665, | ||
"sanitized_text": "Hi! My phone number is 555-555-5555. You can also reach me by email at [email protected]", | ||
}, | ||
{ | ||
"analysis": [ | ||
|
@@ -168,6 +185,7 @@ Sample output: | |
], | ||
"index": 2, | ||
"risk_score_mean": 1, | ||
"sanitized_text": "Hi! What is the title of this book?", | ||
}, | ||
{ | ||
"analysis": [ | ||
|
@@ -187,6 +205,7 @@ Sample output: | |
], | ||
"index": 3, | ||
"risk_score_mean": 2, | ||
"sanitized_text": "<REDACTED>", | ||
}, | ||
{ | ||
"analysis": [ | ||
|
@@ -202,6 +221,7 @@ Sample output: | |
], | ||
"index": 4, | ||
"risk_score_mean": 1, | ||
"sanitized_text": "Hi! I have a cat too.", | ||
}, | ||
], | ||
"detection_count": 5, | ||
|
@@ -229,4 +249,4 @@ Sample output: | |
``` | ||
|
||
Check out full analysis example in the notebook: notebooks/pii-batch-analysis-ms-presidio. | ||
Check out full analysis example in the notebook: notebooks/pii-analysis-ms-presidio. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+17.4 KB
docs/UC2_Using_Presidio_Builtin_Service_for_Detection_and_Analysis.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.