Bootstrap Intent Classification/Utterance Generation #21

benhoff · 2017-12-27T00:54:02Z

Intent Classification

Rasa NLU uses a linear SVM to classify the intent by leveraging spaCy's n-gram model to vectorize utterances.

There's also been research into using seq2seq to classify intent and do slot filling as seen in this paper from microsoft. Also, here's a python implementation.

Entity Extraction

Rasa NLU has several methods of entity extraction as documented here. These include conditional random field for custom entity extraction (not pretrained). SpaCy provides entity extraction as well in the form of an averaged perceptron. The third option is a duckling server, which uses context-free grammar. Facebook has an Open source implementation of context-free grammar.

As mentioned above, a seq2seq approach can also be used as documented here.

Bootstrapping Utterances

Writing Utterances is a pain in the rear. There might be a way to bootstrap the utterance generation to alleviate the need to manually make them.

Here's a list of data corpus's that should prove useful for that regard. That paper also has an overview of useful methods for building dialogue systems.

The paper also has an interesting reference Luke, I am your father: dealing with out-of-domain requests by using movies subtitles. This should be useful for one off responses.

This google blog research has an example for handle to help rank uniquness of response, which will be necessary for generation of unique responses.

this repo uses the Cornell Movie-Dialogs Corpus and a seq to seq neural net to implement the google blog post.

Should be able to also leverage reddit using the movie corpus code I've written already.

Context

The real challenge is going to be handling context.

There's a way to handle the context as proposed in the ubuntu dialog corpus, using an affinity model with context c (five consecutive utterances for example). The Paper is here.

Final Thoughts

The easiest would be to follow the paper to build a one off for out of domain requests. A sort of pithy response bot, as it were.

The text was updated successfully, but these errors were encountered:

benhoff · 2017-12-29T16:01:41Z

Sentence level similarity should be able to be used.

https://towardsdatascience.com/sentence-embedding-3053db22ea77

https://www.microsoft.com/en-us/research/project/deep-reinforcement-learning-goal-oriented-dialogue/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap Intent Classification/Utterance Generation #21

Bootstrap Intent Classification/Utterance Generation #21

benhoff commented Dec 27, 2017 •

edited

Loading

benhoff commented Dec 29, 2017 •

edited

Loading

Bootstrap Intent Classification/Utterance Generation #21

Bootstrap Intent Classification/Utterance Generation #21

Comments

benhoff commented Dec 27, 2017 • edited Loading

Intent Classification

Entity Extraction

Bootstrapping Utterances

Context

Final Thoughts

benhoff commented Dec 29, 2017 • edited Loading

benhoff commented Dec 27, 2017 •

edited

Loading

benhoff commented Dec 29, 2017 •

edited

Loading