ModelJack

Author: Nerses Nersesyan

ModelJack is project for effectively emulating interesting language APIs with simple models that can run locally to avoid latency, costs and request limits.

Example 1: Comment Toxicity

In the first example we show how to train a model that emulates the Google Perspective API using data from the Wikipedia Talk project and the fasttext library.

The Perspective API is a demo released by the Google Jigsaw team. The API scores a comment based on its potential impact on a conversation, deting personal attacks. More detailed information about the project can be found here.

Detecting and reducing toxic comments and personal attacks is very important for most platforms with user-generated content. The Perspective API is potentially very useful, but is a demo limited to 1000 requests.

Can we emulate it so that developers can integrate this functionality into their platforms today?

Running

Download the data to this directory and run:

python ft_cls.py

Results

See Results

Datasets

For training and evaluation of created model we used the Wikipedia Talk project dataset.

Wikipedia Talk project release includes:

a large historical corpus of discussion comments on Wikipedia talk pages
a sample of over 100k comments with human labels for whether the comment contains a personal attack
a sample of over 100k comments with human labels for whether the comment has an aggressive tone

Please refer to meta.wikimedia.org/wiki/Research:Detox/Data_Release for documentation of the schema of each data set.

References

Ex Machina: Personal Attacks Seen at Scale - documentation on the data collection and modeling methodology from Google and Wikimedia

Conversation AI - The Conversation AI Research Github Organization at Google

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
Results.md		Results.md
figure_1.png		figure_1.png
ft_cls.py		ft_cls.py
roadmap.md		roadmap.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelJack

Example 1: Comment Toxicity

Running

Results

Datasets

References

About

Releases

Packages

Contributors 2

Languages

nerses0/ModelJack

Folders and files

Latest commit

History

Repository files navigation

ModelJack

Example 1: Comment Toxicity

Running

Results

Datasets

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages