Skip to content

Latest commit

 

History

History
237 lines (183 loc) · 10.2 KB

README.md

File metadata and controls

237 lines (183 loc) · 10.2 KB

Attention Network

Model Paper
HGT(WWW 2019) Heterogeneous Graph Transformer
SimpleHGN(KDD 2021) Are we really making much progress? Revisiting, benchmarking,and refining heterogeneous graph neural networks
HetSANN(AAAI 2020) An Attention-Based Graph Neural Network for Heterogeneous Structural Learning
ieHGCN(TKDE 2021) Interpretable and Efficient Heterogeneous Graph Convolutional Network

Attention mechanism

This part, we will give the definition of attention methanism based on GAT and Transformer.

  • In GAT, it defined the attentional mechanism. A shared linear transformation, parametrized by a weight matrix, $W\in\mathcal{R}^{F^{'}\times F}$, is applied to every node. Then use a shared attentional mechanism $a: \mathcal{R}^{F^{'}}\times \mathcal{R}^{F}\rightarrow \mathcal{R}$ to compute attention coefficients:

$$ e_{ij} = a(Wh_i, Wh_j) $$

  • this indicate the importance of node $j$'s features to node $i$. $a$ is a single-layer feedforward neural network. Finally we can normalize them across all choices of $j$ using the softmax function:

$$ \alpha_{ij} = softmax_j(e_{ij}) = \frac{\text{exp}(e_{ij})}{\sum_{k\in \mathcal{N}i} \text{exp}(e{ik})} $$

  • In Transformer, an attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key. e.g. Scaled Dot-Product Attention:

$$ Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V $$

DGL API

This part, we will give DGL API we used. As DGL released 0.8.0 version, more API can support heterogeneous graph such as TypedLinear, HeteroLinear. So we will give some details of these APIs.

class TypedLinear(in_size, out_size, num_types, regularizer=None, num_bases=None)

Apply linear transformation according to types.

Parameters:

  • in_size(int): Input feature size.

  • out_size(int): Output feature size.

  • num_types(int): Number of types(node or edge).

  • regularizer(str, optional): Which weight regularizer to use “basis” or “bdd”, default is None:

    • basis: basis-decomposition.
    • bdd: block-diagonal-decomposition.
  • num_bases(int, optional): Number of bases. Needed when regularizer is specified. Typically smaller than num_types. Default: None.

forward(x, x_type, sorted_by_type=False)

Parameters:

  • x(tensor): The input tensor.
  • x_type(tensor): 1D tensor storing the type of the element in x.
  • sorted_by_type(boolean): Whether the inputs have been sorted by the types. Forward on pre-sorted inputs may be faster.

So this API can be used when we use to_homogeneous to convert a heterogeneous graph to a homogeneous graph.

class HeteroLinear(in_size, out_size, bias=True)

Apply linear transformations on heterogeneous inputs.

Parameters:

  • in_size(dict[key, int]): Input feature size for heterogeneous inputs. A key can be a string or a tuple of strings.
  • out_size(int): Output feature size.
  • bias(boolean): If True, learns a bias term.
forward(feat)

Parameters:

  • feat(dict[key, tensor]): Heterogeneous input features.

So this API can be used if we want to apply different linear transformations to different types.

class HeteroGraphConv(mods, aggregate='sum')

The heterograph convolution applies sub-modules on their associating relation graphs, which reads the features from source nodes and writes the updated ones to destination nodes. If multiple relations have the same destination node types, their results are aggregated by the specified method. If the relation graph has no edge, the corresponding module will not be called.

Parameters:

  • mods(dict[str, nn.Module]): Modules associated with every edge types.
  • aggregate (str, callable, optional): Method for aggregating node features generated by different relations. Allowed string values are ‘sum’, ‘max’, ‘min’, ‘mean’, ‘stack’. User can also customize the aggregator by providing a callable instance.
forward(g, inputs, mod_args=None, mod_kwargs=None)

Parameters:

  • g(DGLHeteroGraph) – Graph data.
  • inputs(dict[str, Tensor] or pair of dict[str, Tensor]) – Input node features.

So this API can be used when we need to get relation subgraphs and apply nn.Module to each subgraph.

Typical model

Based on HeteroGraphConv, we divide the attention model into two categories: Direct-Aggregation models and Dual-Aggregation models.

Direct-Aggregation models

Model Attention coefficient
HGT $W_{Q_{\phi{(s)}}}h_s W^{ATT}{\psi{(r)}}(W{K_{\phi{(t)}}}h_t)^T$
SimpleHGN $LeakyReLU(a^T[Wh_s \parallel Wh_t \parallel W_r r_{\psi(<s,t>)}])$
HetSANN $LeakyReLU([W_{\phi(t),\phi(s)} h_s\parallel W_{\phi(t),\phi(s)} h_t]a_r)$

These models only have one aggregation process and do not distinguish between types of edges when aggregating, so they are not suitable for HeteroGraphConv.

Dual-aggregation model

ieHGCN

Model Attention coefficient
ieHGCN $ELU(a^T_{\phi(s)}[W_{Q_{\phi(s)}}h_s\parallel W_{K_{\phi(t)}}h_t])$

This model has two aggregation process and distinguish between types of edges when aggregating, so this is suitable for HeteroGraphConv.

Implement Details

Direct-Aggregation models

  • We first implement the convolution layer of the model SimpleHGN, and HetSANN. The convolutional layer of HGT we use is hgtconv. The __init__ parameters can be different as the models need different parameters. The parameters of the forward part are the same: g is the homogeneous graph, h is the features, ntype denotes the type of each node, etype denotes the type of each edge, presorted tells if the ntype or etype is presorted to use TypedLinear in dgl.nn conveniently. If we use dgl.to_homogeneous to get the features, the features are presorted.

  • Then, we use the convolution layers to implement corresponding models. We need dgl.to_homogeneous to get a homogeneous graph as when we use edge_softmax, we put all the edges together to calculate the attention coefficient instead of distinguishing the type of edges.

  • After passing the convolution layers, we need to convert the output features to a feature dictionary in a heterogeneous graph. We designed a tool in openhgnn.utils.utils.py named to_hetero_feat. This is because we do not have a better solution to get a feature dictionay using dgl. We can only use dgl.to_heterogeneous, but it has many additional operations to make the programs slowly. After we get the feature dictionary, the model is complete.

Dual-Aggregation model

  • We refer to the idea of the implementation of dgl.nn.HeteroGraphConv. We extract the relationship subgraph based on the edge type and complete the aggregation using the convoluntion layers. Then, to aggregate type-specific features across different relations we have to compute attention coefficients step by step.

How to run

  • Clone the Openhgnn-DGL

    # For node classification task
    # You may select model HGT, SimpleHGN, HetSANN
    python main.py -m HGT -t node_classification -d imdb4MAGNN -g 0 --use_best_config

    If you do not have gpu, set -gpu -1.

Performance

Task: Node classification

Evaluation metric: Micro/Macro-F1

HGBn-ACM acm4GTN imdb4MAGNN dblp4MAGNN
Model Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1 Micro-F1 Macro-F1
HGT 88.95 89.18 90.21 90.24 49.37 49.18 87.23 86.46
SimpleHGN 92.27 92.36 89.27 89.28 52.25 48.78 87.72 87.08
HetSANN 88.4 88.7 92.24 92.31 52.88 47.44 89.54 90.24
ie-HGCN 91.71 91.99 92.47 92.56 55.03 52.18 88.36 87.37

Hyper-parameter specific to the model

You can modify the parameters [HGT], [SimpleHGN], [HetSANN], [ieHGCN] in openhgnn/config.ini.

More

Contirbutor

Yaoqi Liu[GAMMA LAB]

If you have any questions,

Submit an issue or email to [email protected].