Skip to content

Commit

Permalink
Er.package edits (#27)
Browse files Browse the repository at this point in the history
* Package docs and example analysis updates

* Version bump
  • Loading branch information
EdyVision authored Dec 28, 2022
1 parent 1e96ef2 commit f51f9d9
Show file tree
Hide file tree
Showing 11 changed files with 124 additions and 85 deletions.
2 changes: 1 addition & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"access_right": "open",
"version": "0.4.2",
"version": "0.4.3",
"creators": [
{
"orcid": "0000-0003-0665-098X",
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ authors:
given-names: Eidan J.
orcid: https://orcid.org/0000-0003-0665-098X
title: "pii-codex: a Python library for PII detection, categorization, and severity assessment"
version: 0.4.2
version: 0.4.3
doi: 10.5281/zenodo.7212576
date-released: 2022-12-26
date-released: 2022-12-28
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ Sample output (results object converted to `dict` from notebook):
"analysis": [
{
"pii_type_detected": "PERSON",
"sanitized_text: "Hi! My name is <REDACTED>",
"risk_level": 3,
"risk_level_definition": "Identifiable",
"cluster_membership_type": "Financial Information",
Expand All @@ -85,6 +84,7 @@ Sample output (results object converted to `dict` from notebook):
],
"index": 0,
"risk_score_mean": 3,
"sanitized_text: "Hi! My name is <REDACTED>",
},
...
],
Expand Down
78 changes: 49 additions & 29 deletions docs/DETECTION_AND_ANALYSIS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,44 @@ The following are not integrated into the service, but have PII type mapping and
<li>Azure PII Detection Cognitive Skill (Requires Azure Account) [<a href="https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-pii-detection">docs</a>]</li>
</ol>

In the case you are using the built-in Presidio functionality, you can call the analysis service as follows:
For those using pre-detected results, adapters are provided to convert types and results to the expected DetectionResult/DetectionResultItem format (see diagram below):

![Converting And Analyzing Existing Detections](UC1_Converting_Existing_Detections.png)

To supply the analyzer module with a collection of pre-detected results from your own Microsoft Presidio, Azure, or AWS Comprehend analysis process, you will need to first convert the detection to a set of DetectionResult objects to feed into the analyzer as follows:

```python
from typing import List
from pii_codex.models.common import (
AnalysisProviderType,
)
from presidio_analyzer import RecognizerResult
from pii_codex.services.analysis_service import PIIAnalysisService
from pii_codex.services.adapters.detection_adapters.presidio_detection_adapter import PresidioPIIDetectionAdapter
from pii_codex.models.analysis import DetectionResult

presidio_detection_service = PresidioPIIDetectionAdapter()

list_of_detections: List[RecognizerResult] = [] # your list of detections
converted_detections: List[DetectionResult] = presidio_detection_service.convert_analyzed_collection(
pii_detections=list_of_detections
)
pii_analysis_service = PIIAnalysisService(
analysis_provider=AnalysisProviderType.PRESIDIO.name
) # If you don't intend to use presidio, override the analysis_provider value

results = pii_analysis_service.analyze_detection_collection(
detection_collection=converted_detections,
collection_name="Data Set Label", # this is more for those that intend to find a way to label collections
collection_type="SAMPLE" # defaults to POPULATION, input used for standard deviation and variance calculations
)
```

The other two detection adapters available are AWSComprehendPIIDetectionAdapter and AzurePIIDetectionAdapter.

<hr>

In the case you require the built-in Presidio functionality, you can call the analysis service as follows:

```python
from pii_codex.services.analysis_service import PIIAnalysisService
Expand All @@ -24,6 +61,11 @@ results = pii_analysis_service.analyze_collection(
)
```

This functionality can easily take a singular text item or a collection of them and runs through the presidio analysis and assessment service files as presented in the diagram below.

![Converting And Analyzing Text with Presidio Builtin](UC2_Using_Presidio_Builtin_Service_for_Detection_and_Analysis.png)


For those analyzing social media posts, you can also supply metadata per text sample to be analyzed in a dataframe.

```python
Expand Down Expand Up @@ -56,32 +98,6 @@ results = pii_analysis_service.analyze_collection(
)
```

To supply the analyzer module with a collection of pre-detected results from your own Microsoft Presidio, Azure, or AWS Comprehend analysis process, you will need to first convert the detection to a set of DetectionResult objects to feed into the analyzer as follows:

```python
from typing import List
from presidio_analyzer import RecognizerResult
from pii_codex.services.analysis_service import PIIAnalysisService
from pii_codex.services.adapters.detection_adapters.presidio_detection_adapter import PresidioPIIDetectionAdapter
from pii_codex.models.analysis import DetectionResult

presidio_detection_service = PresidioPIIDetectionAdapter()

list_of_detections: List[RecognizerResult] = [] # your list of MSFT Presidio detections
converted_detections: List[DetectionResult] = presidio_detection_service.convert_analyzed_collection(
pii_detections=list_of_detections
)
pii_analysis_service = PIIAnalysisService()

results = pii_analysis_service.analyze_detection_collection(
detection_collection=converted_detections,
collection_name="Data Set Label", # this is more for those that intend to find a way to label collections
collection_type="SAMPLE" # defaults to POPULATION, input used for standard deviation and variance calculations
)
```

The other two detection services available are AWSComprehendPIIDetectionAdapter and AzurePIIDetectionAdapter.

Sample output:

```
Expand All @@ -93,7 +109,6 @@ Sample output:
"analysis": [
{
"pii_type_detected": "PERSON",
"sanitized_text": "Hi! My name is <REDACTED>",
"risk_level": 3,
"risk_level_definition": "Identifiable",
"cluster_membership_type": "Financial Information",
Expand All @@ -108,6 +123,7 @@ Sample output:
],
"index": 0,
"risk_score_mean": 3,
"sanitized_text": "Hi! My name is <REDACTED>",
},
{
"analysis": [
Expand Down Expand Up @@ -153,6 +169,7 @@ Sample output:
],
"index": 1,
"risk_score_mean": 2.6666666666666665,
"sanitized_text": "Hi! My phone number is 555-555-5555. You can also reach me by email at [email protected]",
},
{
"analysis": [
Expand All @@ -168,6 +185,7 @@ Sample output:
],
"index": 2,
"risk_score_mean": 1,
"sanitized_text": "Hi! What is the title of this book?",
},
{
"analysis": [
Expand All @@ -187,6 +205,7 @@ Sample output:
],
"index": 3,
"risk_score_mean": 2,
"sanitized_text": "<REDACTED>",
},
{
"analysis": [
Expand All @@ -202,6 +221,7 @@ Sample output:
],
"index": 4,
"risk_score_mean": 1,
"sanitized_text": "Hi! I have a cat too.",
},
],
"detection_count": 5,
Expand Down Expand Up @@ -229,4 +249,4 @@ Sample output:
```

Check out full analysis example in the notebook: notebooks/pii-batch-analysis-ms-presidio.
Check out full analysis example in the notebook: notebooks/pii-analysis-ms-presidio.
Binary file added docs/UC1_Converting_Existing_Detections.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit f51f9d9

Please sign in to comment.