-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected silent exits of presidio application #1505
Comments
Hi, thanks for raising this. Would it be possible to create a slightly more detailed reproducible example? |
Hi,
anonymize_text() is then basically called in a loop that fetches data from a SQL (MariaDB) table and writes the anonymized data into another table. Are there maybe any other trace options to get further output? |
I also tried to see if the problem is with one of the registered anonymizers, trying to exclude some with combinations of analyzer.registry.recognizers = analyzer.registry.recognizers[0:1] to no avail. |
Hi, Below is the function that I'm using:
|
Thanks, we're trying to reproduce this. @janorivera in your case, I see that you're collecting exceptions into the body of the scrubbed message. Do you have instances where the scrubbed message contains an error and not the scrubbed text? Also, it could be more scalable to use the Could you please check if this happens with batch mode too? |
@grafandreas I'm trying to reproduce your case. I'm using this code. Is it different in any way from yours? from logging import getLogger
logger = getLogger()
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import NlpEngineProvider
import presidio_anonymizer
configuration = {
"nlp_engine_name": "spacy",
"models": [{"lang_code": "de", "model_name": "de_core_news_lg"}],
}
# Create NLP engine based on configuration
provider = NlpEngineProvider(nlp_configuration=configuration)
nlp_engine = provider.create_engine()
# the languages are needed to load country-specific recognizers
# for finding phones, passport numbers, etc.
analyzer = AnalyzerEngine(nlp_engine=nlp_engine,
supported_languages=["de"])
def anonymize_text(text: str) -> str:
logger.info(f"Anonymizing text: {text}")
analyzer_results = analyzer.analyze(text=text,
language='de')
logger.info(f"Anonymizer results: {analyzer_results}")
engine = presidio_anonymizer.AnonymizerEngine()
result = engine.anonymize(text=text, analyzer_results=analyzer_results)
logger.info(result)
# Restructuring anonymizer results
anonymization_results = {"anonymized": result.text,"found": [entity.to_dict() for entity in analyzer_results]}
return anonymization_results["anonymized"]
text = """
Hier sind ein paar Beispielsätze, die wir derzeit unterstützen:
Hallo, mein Name ist David Johnson, und ich komme ursprünglich aus Liverpool.
Meine Kreditkartennummer ist 4095-2609-9393-4932, und meine Krypto-Wallet-ID ist 16Yeky6GMjeNkAiNcBY7ZhrLoMSgg1BoyZ.
Am 11.10.2024 habe ich www.microsoft.com besucht und eine E-Mail an [email protected] von der IP-Adresse 192.168.0.1 gesendet.
Mein Reisepass: 191280342 und meine Telefonnummer: (212) 555-1234.
Dies ist eine gültige internationale Bankkontonummer: IL150120690000003111111. Können Sie bitte den Status des Bankkontos 954567876544 überprüfen?
Kates Sozialversicherungsnummer ist 078-05-1126. Ihr Führerschein? Er lautet 1234567A.
"""
for i in range(100000):
if i % 100 == 0:
print(i)
anonymize_text(text) |
@omri374 Yes, that looks very much like the code I use, with the obvious exception of me using different texts. |
Are the texts much longer? Contain non-unicode values? anything else that could be special about them? |
First of all, thanks for the great work on this project.
I am encountering the following problem: The Python app silently exits indeterministicly during a call of anonymize_text().
Activating logging level DEBUG shows the following:
DEBUG:presidio-analyzer:Returning a total of 10 recognizers
INFO:presidio-analyzer:Fetching all recognizers for language de
DEBUG:presidio-analyzer:Returning a total of 10 recognizers
And that is the last output before the application just returns to command line. Other texts passed before are anonymized correctly.
Any pointers / hints on what might cause this problems?
The text was updated successfully, but these errors were encountered: