This project implements a Named Entity Recognition (NER) over an example text using FastAPI.
In the named entity recognition pipeline, following components were used:
- Document assembler
- Tokenizer
- Pretrained spell checker spellcheck_dl
- Pretrained word embeddings glove_100d
- Pretrained NER model onto_100
- NER converter to create NER chunks.
The pipeline created in this project is able to detect following entity types from given text:
CARDINAL
, EVENT
, WORK_OF_ART
, ORG
, DATE
, GPE
, PERSON
, PRODUCT
, NORP
, ORDINAL
, MONEY
, LOC
, FAC
, LAW
, TIME
, PERCENT
, QUANTITY
, LANGUAGE
extraction_model/
:config/
config.py
: Configuration settings for the NER model.
pretrained_models/
: A folder containing pre-trained sparknlp models.glove_100d_en_2.4.0_2.4_1579690104032
: Pretrained word embeddings modelonto_100_en_2.4.0_2.4_1579729071672
: Pretrained NER modelspellcheck_dl_en_3.4.1_3.0_1648457196011
: Pretrained spell checker model
saved_ner_pipeline/
: A folder containing saved NER pipeline.ner_pipeline
: Named Entity Recognition pipeline that is created with this project.
saprknlp_jar/
spark-nlp-assembly-5.3.2.jar
: Jar file for the sparknlp library.
extraction.py
: FastAPI application code containing the endpoints for NER.log_manager.py
: Module for initializing the logger.pipeline_manager.py
: Module for managing the Spark NLP pipeline.
requirements.txt
: List of Python dependencies required to run the project.README.md
: This README file providing information about the project.