Er.package edits (#27)

* Package docs and example analysis updates * Version bump
EdyVision · Dec 28, 2022 · f51f9d9 · f51f9d9
1 parent 1e96ef2
commit f51f9d9
Show file tree

Hide file tree

Showing 11 changed files with 124 additions and 85 deletions.
diff --git a/.zenodo.json b/.zenodo.json
@@ -1,6 +1,6 @@
 {
   "access_right": "open",
-  "version": "0.4.2",
+  "version": "0.4.3",
   "creators": [
     {
       "orcid": "0000-0003-0665-098X",

diff --git a/CITATION.cff b/CITATION.cff
@@ -5,6 +5,6 @@ authors:
     given-names: Eidan J.
     orcid: https://orcid.org/0000-0003-0665-098X
 title: "pii-codex: a Python library for PII detection, categorization, and severity assessment"
-version: 0.4.2
+version: 0.4.3
 doi: 10.5281/zenodo.7212576
-date-released: 2022-12-26
+date-released: 2022-12-28
diff --git a/README.md b/README.md
@@ -70,7 +70,6 @@ Sample output (results object converted to `dict` from notebook):
             "analysis": [
                 {
                     "pii_type_detected": "PERSON",
-                    "sanitized_text: "Hi! My name is <REDACTED>",
                     "risk_level": 3,
                     "risk_level_definition": "Identifiable",
                     "cluster_membership_type": "Financial Information",
@@ -85,6 +84,7 @@ Sample output (results object converted to `dict` from notebook):
             ],
             "index": 0,
             "risk_score_mean": 3,
+            "sanitized_text: "Hi! My name is <REDACTED>",
         },
         ...
     ],

diff --git a/docs/DETECTION_AND_ANALYSIS.md b/docs/DETECTION_AND_ANALYSIS.md
@@ -8,7 +8,44 @@ The following are not integrated into the service, but have PII type mapping and
     <li>Azure PII Detection Cognitive Skill (Requires Azure Account) [<a href="https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-pii-detection">docs</a>]</li>
 </ol>
 
-In the case you are using the built-in Presidio functionality, you can call the analysis service as follows:
+For those using pre-detected results, adapters are provided to convert types and results to the expected DetectionResult/DetectionResultItem format (see diagram below):
+
+![Converting And Analyzing Existing Detections](UC1_Converting_Existing_Detections.png)
+
+To supply the analyzer module with a collection of pre-detected results from your own Microsoft Presidio, Azure, or AWS Comprehend analysis process, you will need to first convert the detection to a set of DetectionResult objects to feed into the analyzer as follows:
+
+```python
+from typing import List
+from pii_codex.models.common import (
+    AnalysisProviderType,
+)
+from presidio_analyzer import RecognizerResult
+from pii_codex.services.analysis_service import PIIAnalysisService
+from pii_codex.services.adapters.detection_adapters.presidio_detection_adapter import PresidioPIIDetectionAdapter
+from pii_codex.models.analysis import DetectionResult
+
+presidio_detection_service = PresidioPIIDetectionAdapter()
+
+list_of_detections: List[RecognizerResult] = [] # your list of detections 
+converted_detections: List[DetectionResult] = presidio_detection_service.convert_analyzed_collection(
+                                                    pii_detections=list_of_detections
+                                              )
+pii_analysis_service = PIIAnalysisService(
+    analysis_provider=AnalysisProviderType.PRESIDIO.name
+)  # If you don't intend to use presidio, override the analysis_provider value
+
+results = pii_analysis_service.analyze_detection_collection(
+    detection_collection=converted_detections,
+    collection_name="Data Set Label",  # this is more for those that intend to find a way to label collections
+    collection_type="SAMPLE"  # defaults to POPULATION, input used for standard deviation and variance calculations
+)
+```
+
+The other two detection adapters available are AWSComprehendPIIDetectionAdapter and AzurePIIDetectionAdapter. 
+
+<hr>
+
+In the case you require the built-in Presidio functionality, you can call the analysis service as follows:
 
 ```python
 from pii_codex.services.analysis_service import PIIAnalysisService
@@ -24,6 +61,11 @@ results = pii_analysis_service.analyze_collection(
 )
 ```
 
+This functionality can easily take a singular text item or a collection of them and runs through the presidio analysis and assessment service files as presented in the diagram below.
+
+![Converting And Analyzing Text with Presidio Builtin](UC2_Using_Presidio_Builtin_Service_for_Detection_and_Analysis.png)
+
+
 For those analyzing social media posts, you can also supply metadata per text sample to be analyzed in a dataframe.
 
 ```python
@@ -56,32 +98,6 @@ results = pii_analysis_service.analyze_collection(
 )
 ```
 
-To supply the analyzer module with a collection of pre-detected results from your own Microsoft Presidio, Azure, or AWS Comprehend analysis process, you will need to first convert the detection to a set of DetectionResult objects to feed into the analyzer as follows:
-
-```python
-from typing import List
-from presidio_analyzer import RecognizerResult
-from pii_codex.services.analysis_service import PIIAnalysisService
-from pii_codex.services.adapters.detection_adapters.presidio_detection_adapter import PresidioPIIDetectionAdapter
-from pii_codex.models.analysis import DetectionResult
-
-presidio_detection_service = PresidioPIIDetectionAdapter()
-
-list_of_detections: List[RecognizerResult] = [] # your list of MSFT Presidio detections 
-converted_detections: List[DetectionResult] = presidio_detection_service.convert_analyzed_collection(
-                                                    pii_detections=list_of_detections
-                                              )
-pii_analysis_service = PIIAnalysisService()
-
-results = pii_analysis_service.analyze_detection_collection(
-    detection_collection=converted_detections,
-    collection_name="Data Set Label",  # this is more for those that intend to find a way to label collections
-    collection_type="SAMPLE"  # defaults to POPULATION, input used for standard deviation and variance calculations
-)
-```
-
-The other two detection services available are AWSComprehendPIIDetectionAdapter and AzurePIIDetectionAdapter.
-
 Sample output:
 
 ```
@@ -93,7 +109,6 @@ Sample output:
             "analysis": [
                 {
                     "pii_type_detected": "PERSON",
-                    "sanitized_text": "Hi! My name is <REDACTED>",
                     "risk_level": 3,
                     "risk_level_definition": "Identifiable",
                     "cluster_membership_type": "Financial Information",
@@ -108,6 +123,7 @@ Sample output:
             ],
             "index": 0,
             "risk_score_mean": 3,
+            "sanitized_text": "Hi! My name is <REDACTED>",
         },
         {
             "analysis": [
@@ -153,6 +169,7 @@ Sample output:
             ],
             "index": 1,
             "risk_score_mean": 2.6666666666666665,
+            "sanitized_text": "Hi! My phone number is 555-555-5555. You can also reach me by email at [email protected]",
         },
         {
             "analysis": [
@@ -168,6 +185,7 @@ Sample output:
             ],
             "index": 2,
             "risk_score_mean": 1,
+            "sanitized_text": "Hi! What is the title of this book?",
         },
         {
             "analysis": [
@@ -187,6 +205,7 @@ Sample output:
             ],
             "index": 3,
             "risk_score_mean": 2,
+            "sanitized_text": "<REDACTED>",
         },
         {
             "analysis": [
@@ -202,6 +221,7 @@ Sample output:
             ],
             "index": 4,
             "risk_score_mean": 1,
+            "sanitized_text": "Hi! I have a cat too.",
         },
     ],
     "detection_count": 5,
@@ -229,4 +249,4 @@ Sample output:
 
 ```
 
-Check out full analysis example in the notebook: notebooks/pii-batch-analysis-ms-presidio.
+Check out full analysis example in the notebook: notebooks/pii-analysis-ms-presidio.
diff --git a/docs/UC1_Converting_Existing_Detections.png b/docs/UC1_Converting_Existing_Detections.png
diff --git a/docs/UC2_Using_Presidio_Builtin_Service_for_Detection_and_Analysis.png b/docs/UC2_Using_Presidio_Builtin_Service_for_Detection_and_Analysis.png