Skip to content

Hybrid Code Networks (HCN)

Suriyadeepan Ramamoorthy edited this page Apr 11, 2017 · 12 revisions

Hybrid Code Networks

HCN combines an RNN with domain-specific knowledge encoded as software and system action templates.

4 Major components of HCN are

  1. RNN
  2. Domain Specific Software
  3. Domain Specific Action Template
  4. Entity Extraction Module

Both the RNN and the developer code maintain dialog state. Each action template can be a textual communicative action or an API call.

  1. Utterance : The user provides an utterance as text.
  2. BoW : Bag of words representation of the utterance is done.
  3. Utterance Embedding : Average word embedding of utterance is calculated. 300 dimensional word2vec is used. The update of the word embeddings is forbidden during training.
  4. Entity Extraction : Identify entities in the utterance, like name, place, time, etc,; simple string matching approach is proposed in the paper
  5. Entity Tracking : Map entities in the utterance, to a row in database; override existing entities (in buffer) with new entities.
  6. Action Mask : A bit vector representation of which actions are allowed given the context at current timestamp (turn)
    • Context Features : Handwritten features extracted from entities that may be useful for distinguishing among actions
    • A feature vector is formed based on information extracted from (1) to (5).
  7. The feature vector is passed to an RNN
  8. RNN computes and maintains a hidden(internal) state
  9. Based on the hidden state, an output distribution over the list of distinct system action templates
  10. The action mask from (6), is applied as elementwise multiplication which should elimante non-permitted actions in the probability distribution over all actions
  11. An action is selected from the output distribution
  • In a supervised learning setting, the action that has the maximum probability is selected
  • In a reinforcement learning setting, an action is sampled from the distribution
  1. Entity Output : Based on the selected action, entities are filled in; fully-formed actions are produced
  2. Depending on the action,
  • An API call is made, which fetches info and provides rich content to user, and depending on dialog state, contributes to the feature vector, to be passed to RNN, during next time step (turn)
  • If the action is just text, it is rendered to the user
  1. The action taken, also contributes to the feature vector.
  • LSTM is used in RNN
  • AdaDelta optimizer is chosen for training

Reinforcement Learning

Once a system operates at a scale, interacting with a large number of users, it is desirable for the system to continue to learn autonomously using reinforcement learning. Policy Gradient method is selected for optimization.

A model $$\pi$$ is is parameterized by w and outputs a distribution from which actions are sampled at each time step. At the end of the dialog, the reward G for the dialog is computed and the gradients of the probabilities of the actions taken with respect to the model weights are computed. The weights are then adjusted by taking a gradient step proportional to the return:

The LSTM in the network, represents the stochastic polity $$\pi$$, which outputs a distribution over a given dialog history h, parameterized by w. The baseline b is an estimate of average reward for the current policy.

"better" dialogs receive a positive gradient step, making the actions selected more likely and "worse" dialogs receive a negative gradient step, making the actions selected less likely.

Supervised Learning and Reinforcement Learning can be applied to the same network. After each RL gradient step, we check whether the updated policy reconstructs the training set. If not, we re-run SL gradient steps on the training set until the model reproduces the training set. This allows new training dialogs to be added at any time during RL optimization.

Implementation

Keras Model Specification

# Given:
# obs_size, action_size, nb_hidden
g = Graph()
g.add_input(
name=obs’,
input_shape=(None, obs_size)
)
g.add_input(
name=prev_action’,
input_shape=(None, action_size)
)
g.add_input(
name=avail_actions’,
input_shape=(None, action_size)
)
g.add_node(
LSTM(
n_hidden,
return_sequences=True,
activation=tanh’,
),
name=h1’,
inputs=[
’obs’,
’prev_action’,
’avail_actions’
]
)
g.add_node(
TimeDistributedDense(
action_size,
activation=softmax’,
),
name=h2’,
input=h1’
)
g.add_node(
Activation(
activation=normalize,
),
name=action’,
inputs=[’h2’,’avail_actions’],
merge_mode=mul’,
create_output=True
)
g.compile(
optimizer=Adadelta(clipnorm=1.),
sample_weight_modes={
’action’: ’temporal’
},
loss={
’action’:’categorical_crossentropy’
}
)
Clone this wiki locally