TensorFlow implementation of Graphical Attention Recurrent Neural Networks based on work by Cirstea et al., 2019.
Moreover, we offer stand-alone implementations of the Graph Attention Mechanism (Veličković et al., 2017) and Diffusional Graph Convolution (Li et al., 2017).
Simply run pip install garnn
. Dependencies are numpy; tensorflow
.
The core data structure is the graph signal. If we have N nodes in a graph each having F observed features then the graph signal is the tensor with shape (batch, N, F) corresponding to the data produced by all nodes. Often we have sequences of graph signals in a time series. We will call them temporal graph signals and assume a shape of (batch, time steps, N, F). We also need to know the adjacency matrix E of the underlying graph with shape (N, N).
All but the recurrent layers work with non-temporal data, i.e. the data points are individual graph signals and not sequences of graph signals.
The AttentionMechanism
found in garnn.components.attention
will take a graph signal and return an attention matrix as described in Veličković et al., 2017.
The layer is initiated with the following parameters:
Parameter | Function |
---|---|
F (required) |
Dimension of internal embedding. |
adjacency_matrix (required) |
Adjacency matrix of the graph. |
num_heads (default: 1) |
Number of attention matrices that are averaged to return the output attention. |
use_reverse_diffusion (default: True) |
Whether or not to calculate A_in and A_out as done by Cirstea et al., 2019. If E is symmetric then the value will be set to False. |
The output is of shape (batch, N, N). If use_reverse_diffusion
is true then we obtain 2 attention matrices and thus the shape is (batch, 2, N, N).
The GraphDiffusionConvolution
layer in garnn.layers.diffconv
offers diffusion graph convolution as described by Li et al., 2017. It operates on a tuple containing a graph signal X and a transition matrix A (usually an attention matrix returned by an attention mechanism) and is initiated with the following parameters
Parameter | Function |
---|---|
features (required) |
Number output features. Q in the paper. |
num_diffusion_steps (required) |
Number of hops done by the diffusion process. K in the paper. |
There are more specialised parameters like regularisers and initialisers -- those can be found in the doc string. The convolutional layer returns a diffused graph signal of shape (batch, N, Q).
Thus, if we have 10 nodes with 5 features each and we would like to apply diffusion graph convolution with 20 features using a 5-head attention mechanism with an internal embedding of 64 units then we would need to run
from garnn.components.attention import AttentionMechanism
from garnn.layers.diffconv import GraphDiffusionConvolution
from tensorflow.keras.layers import Input
# input of 10 by 5 graph signal
inputs = Input(shape=(10, 5))
# Initiating attention mechanism. Make sure you define E
Attn = AttentionMechanism(64, adjacency_matrix=E, num_heads=5)(inputs)
# Now the convolutional layer. Make sure you use the correct order in the
# input-tuple: Graph signal is always first!
output = GraphDiffusionConvolution(
features=10, num_diffusion_steps=5)((inputs, Attn))
Both AttentionMechanism
and DiffusionGraphConvolution
naturally extend to temporal graph signals. The output now simply has an additional time steps dimension.
The garnn_gru
layer found in garnn.components.garnn_gru
is the diffusional & attention-based GRU introduced by Cirstea et al., 2019. It operates on temporal graph signals and an attention mechanism. Initiate with
Parameter | Function |
---|---|
num_hidden_features (required) |
Number of features in the hidden state. (Q in the paper) |
num_diffusion_steps (default: 5) |
Number of hops done by the diffusion process in all internal convolutions. K in the paper. |
return_sequence (default: False) |
Whether or not the RNN should return the hidden state at each time step or only at the final time step. Set it to True if you stack another RNN layer on top of this one. |
Hence, if we would like to rebuild a model similar to the one used by Cirstea et al., 2019 then one needs to run
import numpy as np
from tensorflow.keras import Model
from tensorflow.keras.layers import Input
from garnn.components.attention import AttentionMechanism
from garnn.components.garnn_gru import garnn_gru
# We assume we have 207 nodes with 10 features each.
# For simplicity we create a random adjacency matrix E
E = np.random.randint(0, 2, size=(207, 207))
# Input of the temporal graph signals. Note that "None" allows
# us to pass in variable length time series.
X = Input(shape=(None, 207, 10))
# creating attention mechanism with 3 heads and an
# embedding-size of 16
A = AttentionMechanism(16, adjacency_matrix=E, num_heads=3)(X)
# Piping X and A into the 2 stacked GRU layers.
# First layer streches the features into 64 diffused features.
# We assume that we are using 6 hop diffusions.
gru_1 = garnn_gru(num_hidden_features=64, num_diffusion_steps=6, return_sequences=True)(
(X, A)
)
# And then we use the 2nd GRU to shrink it back to 1 feature - the feature
# that we are predicting. We use the same attention for this layer, but note that
# we could've also introduce a new attention mechanism for this GRU.
output = garnn_gru(num_hidden_features=1, num_diffusion_steps=6)((gru_1, A))
garnn_model = Model(inputs=X, outputs=output)
Bug reports, fixes and additional features are always welcome! Make sure to run the tests with python setup.py test
and write your own for new features. Thanks.