Through this project we develop a pipeline to detoxify an input sentence without the usage of parallel detoxified data to train our model.
We identify the toxic word, mask it and run our fine tuned BERT to generate the most appropriate non toxic word as a replacement.
- To mask the toxic word we use the following approaches
- Bag of words: we simply mask the words present in a pre-existing list of toxic and non toxic words. We obtained this list by running
NgramSalienceCalculator
on the previous lists. (code present here) - Linear Regression: A coefficient is generated for each word by fitting the data linearly onto the toxic and non toxic corpus.
- Using
Roberta
, a toxic sentence classifier. For a toxic sentence of n words, we create n variations of the sentence by masking a different word in each variant. If the sentence is below a toxic threshold, we return this sentence. Else, we further mask another token in this sentence. This process is done until the sentence is below the threshold.
- Bag of words: we simply mask the words present in a pre-existing list of toxic and non toxic words. We obtained this list by running
- To generate substitutes for the masked tokens we use the BERT model trained on Masked Language Model.
-
A list of masked positive and negative sentences is used to fine tune this model. To train, we pass the masked positive sentences with flag 0 and negative with flag 1 into the model, with target being the masked token.
-
While generating the actual tokens for our specific task, we always pass the flag 0 as we want the output to be non toxic.
-
We use the following approaches to MASK tokens for fine tuning:
- Random Approach: one word at random is masked in the sentence. This approach was suggested by the paper
Conditional BERT Contextual Augmentation
. - Targetted approach: mask the words present in a pre-existing list of toxic and non toxic words.
- Random Approach: one word at random is masked in the sentence. This approach was suggested by the paper
-
Now, this fine tuned BERT will give us 10 possible alternatives to the originally masked token.
-
We will evaluate the alternatives by using the product of similarity with respect to the original masked token with the fluency and non-toxicity.
-
Similarity will be determined by using cosine similarity, fluency by
GPT2LMHeadModel
and non-toxicity usingRoberta
-
We also provide multi token alternatives by replacing the single masks with a double masks, and then comparing the output with that obtained from a single mask. An example is as follows:
The script folder contains the notebooks for baseline models, fine tuning BERT and the final detoxifying pipeline.
The data folder contains the
- jigsaw dataset: can be downloaded from here. This data is used to calculate the accuracy of our model.
- training data: list of positive and negative sentences that is used to fine tune BERT
- vocab: list of positive and negative words obtained
pip install -r requirements.txt
to install the required dependencies.- To detoxify a given sentence, refer this
Further,
- To fine tune the BERT Model, refer thishttp://moss.stanford.edu/results/5/6862770529437
- To run the baseline model, refer this
The fine tuned BERT models can be found here
Some examples of the output obtained:
The targetted fine tuned model showcased better results as compared to the random fine tuned model suggested by the paper Conditional BERT Contextual Augmentation
.
For example,
The metric used to evaluate our model was (similarity*fluency)/(perplexities). The scores obtained were as follows:
- Baseline Model (T5 Paraphraser): 4.370629371e-3
- Random-masked BERT: 1.419647927e-2
- Target-masked BERT: 1.744591766e-2