Skip to content

Commit

Permalink
Add BiparteGraphSage tutorial to show use of categorical_attrs_desc (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
acezen authored Aug 13, 2021
1 parent f8fa4f6 commit 6ce1e8a
Show file tree
Hide file tree
Showing 7 changed files with 551 additions and 9 deletions.
3 changes: 2 additions & 1 deletion docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@ All tutorials are listed as follows:
Interactive Query with Gremlin <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/6_interactive_query_with_gremlin.ipynb>
Unsupervised Learning with GraphSage <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/7_unsupervised_learning_with_graphsage.ipynb>
Supervised Learning with GCN <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/8_supervised_learning_with_gcn.ipynb>
A Complex Workflow: Node Classification on Citation Network <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/9_node_classification_on_citation_network.ipynb>
Unsupervised Learning with BipartiteGraphSage <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/9_unsupervised_learning_with_bipartite_graphsage.ipynb>
A Complex Workflow: Node Classification on Citation Network <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/10_node_classification_on_citation_network.ipynb>
3 changes: 2 additions & 1 deletion docs/zh/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@
基于Gremlin的交互式查询 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/6_interactive_query_with_gremlin.ipynb>
基于GraphSage的无监督学习 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/7_unsupervised_learning_with_graphsage.ipynb>
基于GCN的有监督学习 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/8_supervised_learning_with_gcn.ipynb>
论文引用网络中的节点分类任务 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/9_node_classification_on_citation_network.ipynb>
基予BipartiteGraphSage的二部图无监督学习 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/9_unsupervised_learning_with_bipartite_graphsage.ipynb>
论文引用网络中的节点分类任务 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/10_node_classification_on_citation_network.ipynb>
268 changes: 268 additions & 0 deletions tutorials/9_unsupervised_learning_with_bipartite_graphsage.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Unsupervised Graph Learning with BipartiteGraphSage\n",
"\n",
"\n",
"Bipartite graphs are very common in e-commerce recommendation. In this tutorial, we demostrate how GraphScope trains a model with BipartiteGraphSage on bipartite graph.\n",
"\n",
"The task is link prediction, which estimates the probability of links between user and item nodes in a graph.\n",
"\n",
"In this task, we use our implementation of BipartiteGraphSage algorithm to build a model that predicts user-item links in the [U2I](http://graph-learn-dataset.oss-cn-zhangjiakou.aliyuncs.com/u2i.zip) dataset. In which nodes can represents user node and item node. The task can be treated as a unsupervised link prediction on a homogeneous link network.\n",
"\n",
"In this task, BipartiteGraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.\n",
"\n",
"This tutorial has following steps:\n",
"- Creating session and loading graph\n",
"- Launching the learning engine and attaching to loaded graph.\n",
"- Defining train process with builtin GraphSage model and hyperparameters\n",
"- Training and evaluating\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's create a session and load the dataset as a graph."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import graphscope\n",
"\n",
"k8s_volumes = {\n",
" \"data\": {\n",
" \"type\": \"hostPath\",\n",
" \"field\": {\n",
" \"path\": \"/testingdata\",\n",
" \"type\": \"Directory\"\n",
" },\n",
" \"mounts\": {\n",
" \"mountPath\": \"/home/jovyan/datasets\",\n",
" \"readOnly\": True\n",
" }\n",
" }\n",
"}\n",
"\n",
"# create session\n",
"graphscope.set_option(show_log=True)\n",
"sess = graphscope.session(k8s_volumes=k8s_volumes)\n",
"\n",
"# loading u2i graph\n",
"graph = sess.g()\n",
"graph = graph.add_vertices(\n",
" Loader(\"/home/jovyan/datasets/u2i/node.csv\", delimiter=\"\\t\"),\n",
" label=\"u\",\n",
" properties=[(\"feature\", \"str\")],\n",
" vid_field=\"id\"\n",
")\n",
"graph = graph.add_vertices(\n",
" Loader(\"/home/jovyan/datasets/u2i/node.csv\", delimiter=\"\\t\"),\n",
" label=\"i\",\n",
" properties=[(\"feature\", \"str\")],\n",
" vid_field=\"id\"\n",
")\n",
"graph = graph.add_edges(\n",
" Loader(\"/home/jovyan/datasets/u2i/edge.csv\", delimiter=\"\\t\"),\n",
" label=\"u-i\",\n",
" properties=[\"weight\"],\n",
" src_label=\"u\",\n",
" dst_label=\"i\",\n",
" src_field=\"src_id\",\n",
" dst_field=\"dst_id\"\n",
")\n",
"graph = graph.add_edges(\n",
" Loader(\"/home/jovyan/datasets/u2i/edge.csv\", delimiter=\"\\t\"),\n",
" label=\"u-i_reverse\",\n",
" properties=[\"weight\"],\n",
" src_label=\"i\",\n",
" dst_label=\"u\",\n",
" src_field=\"dst_id\",\n",
" dst_field=\"src_id\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Launch learning engine \n",
"Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose the `feature` property as the training features.\n",
"\n",
"With the featrue list, next we launch a learning engine with the learning method of session. (You may find the detail of the method on [Session](https://graphscope.io/docs/reference/session.html).)\n",
"\n",
"In this case, we specify the BipartiteGraphSage training over `user` and `item` nodes and `u-i` edges.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# launch a learning engine.\n",
"lg = sess.learning(\n",
" graph,\n",
" nodes=[(\"u\", [\"feature\"]), (\"i\", [\"feature\"])],\n",
" edges=[((\"u\", \"u-i\", \"i\"), [\"weight\"]), ((\"i\", \"u-i_reverse\", \"u\"), [\"weight\"])],\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"We use the builtin BipartiteGraphSage model to define the training process. You can find more detail about all the builtin learning models on [Graph Learning Model](https://graphscope.io/docs/learning_engine.html#data-model)\n",
"\n",
"In the example, we use tensorflow as NN backend trainer.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"import numpy as np\n",
"from graphscope.learning.examples import BipartiteGraphSage\n",
"from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer\n",
"from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer\n",
"\n",
"# unsupervised GraphSage.\n",
"\n",
"def train(config, graph):\n",
" def model_fn():\n",
" return BipartiteGraphSage(graph,\n",
" config['batch_size'],\n",
" config['hidden_dim'],\n",
" config['output_dim'],\n",
" config['hops_num'],\n",
" config['u_neighs_num'],\n",
" config['i_neighs_num'],\n",
" u_features_num=config['u_features_num'],\n",
" u_categorical_attrs_desc=config['u_categorical_attrs_desc'],\n",
" i_features_num=config['i_features_num'],\n",
" i_categorical_attrs_desc=config['i_categorical_attrs_desc'],\n",
" neg_num=config['neg_num'],\n",
" use_input_bn=config['use_input_bn'],\n",
" act=config['act'],\n",
" agg_type=config['agg_type'],\n",
" need_dense=config['need_dense'],\n",
" in_drop_rate=config['drop_out'],\n",
" ps_hosts=config['ps_hosts'])\n",
" trainer = LocalTFTrainer(model_fn,\n",
" epoch=config['epoch'],\n",
" optimizer=get_tf_optimizer(\n",
" config['learning_algo'],\n",
" config['learning_rate'],\n",
" config['weight_decay']))\n",
"\n",
" trainer.train()\n",
"\n",
" u_embs = trainer.get_node_embedding(\"u\")\n",
" np.save('u_emb', u_embs)\n",
" i_embs = trainer.get_node_embedding(\"i\")\n",
" np.save('i_emb', i_embs)\n",
"\n",
"# define hyperparameters\n",
"config = {'batch_size': 128,\n",
" 'hidden_dim': 128,\n",
" 'output_dim': 128,\n",
" 'u_features_num': 1,\n",
" 'u_categorical_attrs_desc': {\"0\":[\"u_id\",10000,64]},\n",
" 'i_features_num': 1,\n",
" 'i_categorical_attrs_desc': {\"0\":[\"i_id\",10000,64]},\n",
" 'hops_num': 1,\n",
" 'u_neighs_num': [10],\n",
" 'i_neighs_num': [10],\n",
" 'neg_num': 10,\n",
" 'learning_algo': 'adam',\n",
" 'learning_rate': 0.001,\n",
" 'weight_decay': 0.0005,\n",
" 'epoch': 10,\n",
" 'use_input_bn': True,\n",
" 'act': tf.nn.leaky_relu,\n",
" 'agg_type': 'gcn',\n",
" 'need_dense': True,\n",
" 'drop_out': 0.0,\n",
" 'ps_hosts': None\n",
" }"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run training process\n",
"\n",
"After define training process and hyperparameters,\n",
"\n",
"Now we can start the traning process with learning engine `lg` and the hyperparameters configurations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train(config, lg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, don't forget to close the session."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sess.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
14 changes: 7 additions & 7 deletions tutorials/zh/7_unsupervised_learning_with_graphsage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,12 @@
"metadata": {},
"outputs": [],
"source": [
"# define the features for learning\n",
"# 定义特征集\n",
"paper_features = []\n",
"for i in range(50):\n",
" paper_features.append(\"feat-\" + str(i))\n",
"\n",
"# launch a learning engine.\n",
"# 启动learning engine\n",
"lg = sess.learning(graph, nodes=[(\"protein\", paper_features)],\n",
" edges=[(\"protein\", \"link\", \"protein\")],\n",
" gen_labels=[\n",
Expand Down Expand Up @@ -150,7 +150,7 @@
" embs = trainer.get_node_embedding()\n",
" np.save(config['emb_save_dir'], embs)\n",
"\n",
"# define hyperparameters\n",
"# 定义超参\n",
"config = {\n",
" \"class_num\": 128, # output dimension\n",
" \"features_num\": 50,\n",
Expand All @@ -177,7 +177,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run training process\n",
"## 执行训练过程\n",
"\n",
"\n",
"在定义完训练过程和超参后,现在我们可以使用学习引擎和定义的超参开始训练过程。"
Expand Down Expand Up @@ -219,7 +219,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -233,9 +233,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
}
Loading

0 comments on commit 6ce1e8a

Please sign in to comment.