Hybrid Code Networks (HCN)

Hybrid Code Networks

HCN combines an RNN with domain-specific knowledge encoded as software and system action templates.

4 Major components of HCN are

Both the RNN and the developer code maintain dialog state. Each action template can be a textual communicative action or an API call.

Utterance : The user provides an utterance as text.
BoW : Bag of words representation of the utterance is done.
Utterance Embedding : Average word embedding of utterance is calculated. 300 dimensional word2vec is used. The update of the word embeddings is forbidden during training.
Entity Extraction : Identify entities in the utterance, like name, place, time, etc,; simple string matching approach is proposed in the paper
Entity Tracking : Map entities in the utterance, to a row in database; override existing entities (in buffer) with new entities.
Action Mask : A bit vector representation of which actions are allowed given the context at current timestamp (turn)
- Context Features : Handwritten features extracted from entities that may be useful for distinguishing among actions
- A feature vector is formed based on information extracted from (1) to (5).
The feature vector is passed to an RNN
RNN computes and maintains a hidden(internal) state
Based on the hidden state, an output distribution over the list of distinct system action templates
The action mask from (6), is applied as elementwise multiplication which should elimante non-permitted actions in the probability distribution over all actions
An action is selected from the output distribution

In a supervised learning setting, the action that has the maximum probability is selected
In a reinforcement learning setting, an action is sampled from the distribution

Entity Output : Based on the selected action, entities are filled in; fully-formed actions are produced
Depending on the action,

An API call is made, which fetches info and provides rich content to user, and depending on dialog state, contributes to the feature vector, to be passed to RNN, during next time step (turn)
If the action is just text, it is rendered to the user