Add BiparteGraphSage tutorial to show use of categorical_attrs_desc (#…

…691)
alibaba · Aug 13, 2021 · 6ce1e8a · 6ce1e8a
1 parent f8fa4f6
commit 6ce1e8a
Show file tree

Hide file tree

Showing 7 changed files with 551 additions and 9 deletions.
diff --git a/docs/tutorials.rst b/docs/tutorials.rst
@@ -23,4 +23,5 @@ All tutorials are listed as follows:
    Interactive Query with Gremlin <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/6_interactive_query_with_gremlin.ipynb>
    Unsupervised Learning with GraphSage <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/7_unsupervised_learning_with_graphsage.ipynb>
    Supervised Learning with GCN <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/8_supervised_learning_with_gcn.ipynb>
-   A Complex Workflow: Node Classification on Citation Network <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/9_node_classification_on_citation_network.ipynb>
+   Unsupervised Learning with BipartiteGraphSage <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/9_unsupervised_learning_with_bipartite_graphsage.ipynb>
+   A Complex Workflow: Node Classification on Citation Network <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/10_node_classification_on_citation_network.ipynb>
diff --git a/docs/zh/tutorials.rst b/docs/zh/tutorials.rst
@@ -22,4 +22,5 @@
    基于Gremlin的交互式查询 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/6_interactive_query_with_gremlin.ipynb>
    基于GraphSage的无监督学习 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/7_unsupervised_learning_with_graphsage.ipynb>
    基于GCN的有监督学习 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/8_supervised_learning_with_gcn.ipynb>
-   论文引用网络中的节点分类任务 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/9_node_classification_on_citation_network.ipynb>
+   基予BipartiteGraphSage的二部图无监督学习 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/9_unsupervised_learning_with_bipartite_graphsage.ipynb>
+   论文引用网络中的节点分类任务 <https://nbviewer.jupyter.org/github/alibaba/GraphScope/blob/main/tutorials/zh/10_node_classification_on_citation_network.ipynb>
diff --git a/..._classification_on_citation_network.ipynb → ..._classification_on_citation_network.ipynb b/..._classification_on_citation_network.ipynb → ..._classification_on_citation_network.ipynb
diff --git a/tutorials/9_unsupervised_learning_with_bipartite_graphsage.ipynb b/tutorials/9_unsupervised_learning_with_bipartite_graphsage.ipynb
@@ -0,0 +1,268 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Unsupervised Graph Learning with BipartiteGraphSage\n",
+    "\n",
+    "\n",
+    "Bipartite graphs are very common in e-commerce recommendation. In this tutorial, we demostrate how GraphScope trains a model with BipartiteGraphSage on bipartite graph.\n",
+    "\n",
+    "The task is link prediction, which estimates the probability of links between user and item nodes in a graph.\n",
+    "\n",
+    "In this task, we use our implementation of BipartiteGraphSage algorithm to build a model that predicts user-item links in the [U2I](http://graph-learn-dataset.oss-cn-zhangjiakou.aliyuncs.com/u2i.zip) dataset. In which nodes can represents user node and item node. The task can be treated as a unsupervised link prediction on a homogeneous link network.\n",
+    "\n",
+    "In this task, BipartiteGraphSage algorithm would compress both structural and attribute information in the graph into low-dimensional embedding vectors on each node. These embeddings can be further used to predict links between nodes.\n",
+    "\n",
+    "This tutorial has following steps:\n",
+    "- Creating session and loading graph\n",
+    "- Launching the learning engine and attaching to loaded graph.\n",
+    "- Defining train process with builtin GraphSage model and hyperparameters\n",
+    "- Training and evaluating\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, let's create a session and load the dataset as a graph."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import graphscope\n",
+    "\n",
+    "k8s_volumes = {\n",
+    "    \"data\": {\n",
+    "        \"type\": \"hostPath\",\n",
+    "        \"field\": {\n",
+    "          \"path\": \"/testingdata\",\n",
+    "          \"type\": \"Directory\"\n",
+    "        },\n",
+    "        \"mounts\": {\n",
+    "          \"mountPath\": \"/home/jovyan/datasets\",\n",
+    "            \"readOnly\": True\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "# create session\n",
+    "graphscope.set_option(show_log=True)\n",
+    "sess = graphscope.session(k8s_volumes=k8s_volumes)\n",
+    "\n",
+    "# loading u2i graph\n",
+    "graph = sess.g()\n",
+    "graph = graph.add_vertices(\n",
+    "    Loader(\"/home/jovyan/datasets/u2i/node.csv\", delimiter=\"\\t\"),\n",
+    "    label=\"u\",\n",
+    "    properties=[(\"feature\", \"str\")],\n",
+    "    vid_field=\"id\"\n",
+    ")\n",
+    "graph = graph.add_vertices(\n",
+    "    Loader(\"/home/jovyan/datasets/u2i/node.csv\", delimiter=\"\\t\"),\n",
+    "    label=\"i\",\n",
+    "    properties=[(\"feature\", \"str\")],\n",
+    "    vid_field=\"id\"\n",
+    ")\n",
+    "graph = graph.add_edges(\n",
+    "    Loader(\"/home/jovyan/datasets/u2i/edge.csv\", delimiter=\"\\t\"),\n",
+    "    label=\"u-i\",\n",
+    "    properties=[\"weight\"],\n",
+    "    src_label=\"u\",\n",
+    "    dst_label=\"i\",\n",
+    "    src_field=\"src_id\",\n",
+    "    dst_field=\"dst_id\"\n",
+    ")\n",
+    "graph = graph.add_edges(\n",
+    "    Loader(\"/home/jovyan/datasets/u2i/edge.csv\", delimiter=\"\\t\"),\n",
+    "    label=\"u-i_reverse\",\n",
+    "    properties=[\"weight\"],\n",
+    "    src_label=\"i\",\n",
+    "    dst_label=\"u\",\n",
+    "    src_field=\"dst_id\",\n",
+    "    dst_field=\"src_id\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Launch learning engine \n",
+    "Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose the `feature` property as the training features.\n",
+    "\n",
+    "With the featrue list, next we launch a learning engine with the learning method of session. (You may find the detail of the method on [Session](https://graphscope.io/docs/reference/session.html).)\n",
+    "\n",
+    "In this case, we specify the BipartiteGraphSage training over `user` and `item` nodes and `u-i` edges.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# launch a learning engine.\n",
+    "lg = sess.learning(\n",
+    "    graph,\n",
+    "    nodes=[(\"u\", [\"feature\"]), (\"i\", [\"feature\"])],\n",
+    "    edges=[((\"u\", \"u-i\", \"i\"), [\"weight\"]), ((\"i\", \"u-i_reverse\", \"u\"), [\"weight\"])],\n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "We use the builtin BipartiteGraphSage model to define the training process. You can find more detail about all the builtin learning models on [Graph Learning Model](https://graphscope.io/docs/learning_engine.html#data-model)\n",
+    "\n",
+    "In the example, we use tensorflow as NN backend trainer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "import numpy as np\n",
+    "from graphscope.learning.examples import BipartiteGraphSage\n",
+    "from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer\n",
+    "from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer\n",
+    "\n",
+    "# unsupervised GraphSage.\n",
+    "\n",
+    "def train(config, graph):\n",
+    "    def model_fn():\n",
+    "        return  BipartiteGraphSage(graph,\n",
+    "                               config['batch_size'],\n",
+    "                               config['hidden_dim'],\n",
+    "                               config['output_dim'],\n",
+    "                               config['hops_num'],\n",
+    "                               config['u_neighs_num'],\n",
+    "                               config['i_neighs_num'],\n",
+    "                               u_features_num=config['u_features_num'],\n",
+    "                               u_categorical_attrs_desc=config['u_categorical_attrs_desc'],\n",
+    "                               i_features_num=config['i_features_num'],\n",
+    "                               i_categorical_attrs_desc=config['i_categorical_attrs_desc'],\n",
+    "                               neg_num=config['neg_num'],\n",
+    "                               use_input_bn=config['use_input_bn'],\n",
+    "                               act=config['act'],\n",
+    "                               agg_type=config['agg_type'],\n",
+    "                               need_dense=config['need_dense'],\n",
+    "                               in_drop_rate=config['drop_out'],\n",
+    "                               ps_hosts=config['ps_hosts'])\n",
+    "      trainer = LocalTFTrainer(model_fn,\n",
+    "                              epoch=config['epoch'],\n",
+    "                              optimizer=get_tf_optimizer(\n",
+    "                              config['learning_algo'],\n",
+    "                                  config['learning_rate'],\n",
+    "                                  config['weight_decay']))\n",
+    "\n",
+    "      trainer.train()\n",
+    "\n",
+    "      u_embs = trainer.get_node_embedding(\"u\")\n",
+    "      np.save('u_emb', u_embs)\n",
+    "      i_embs = trainer.get_node_embedding(\"i\")\n",
+    "      np.save('i_emb', i_embs)\n",
+    "\n",
+    "# define hyperparameters\n",
+    "config = {'batch_size': 128,\n",
+    "            'hidden_dim': 128,\n",
+    "            'output_dim': 128,\n",
+    "            'u_features_num': 1,\n",
+    "            'u_categorical_attrs_desc': {\"0\":[\"u_id\",10000,64]},\n",
+    "            'i_features_num': 1,\n",
+    "            'i_categorical_attrs_desc': {\"0\":[\"i_id\",10000,64]},\n",
+    "            'hops_num': 1,\n",
+    "            'u_neighs_num': [10],\n",
+    "            'i_neighs_num': [10],\n",
+    "            'neg_num': 10,\n",
+    "            'learning_algo': 'adam',\n",
+    "            'learning_rate': 0.001,\n",
+    "            'weight_decay': 0.0005,\n",
+    "            'epoch': 10,\n",
+    "            'use_input_bn': True,\n",
+    "            'act': tf.nn.leaky_relu,\n",
+    "            'agg_type': 'gcn',\n",
+    "            'need_dense': True,\n",
+    "            'drop_out': 0.0,\n",
+    "            'ps_hosts': None\n",
+    "         }"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run training process\n",
+    "\n",
+    "After define training process and hyperparameters,\n",
+    "\n",
+    "Now we can start the traning process with learning engine `lg` and the hyperparameters configurations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train(config, lg)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, don't forget to close the session."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sess.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/..._classification_on_citation_network.ipynb → ..._classification_on_citation_network.ipynb b/..._classification_on_citation_network.ipynb → ..._classification_on_citation_network.ipynb
diff --git a/tutorials/zh/7_unsupervised_learning_with_graphsage.ipynb b/tutorials/zh/7_unsupervised_learning_with_graphsage.ipynb
@@ -83,12 +83,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# define the features for learning\n",
+    "# 定义特征集\n",
     "paper_features = []\n",
     "for i in range(50):\n",
     "    paper_features.append(\"feat-\" + str(i))\n",
     "\n",
-    "# launch a learning engine.\n",
+    "# 启动learning engine\n",
     "lg = sess.learning(graph, nodes=[(\"protein\", paper_features)],\n",
     "                  edges=[(\"protein\", \"link\", \"protein\")],\n",
     "                  gen_labels=[\n",
@@ -150,7 +150,7 @@
     "    embs = trainer.get_node_embedding()\n",
     "    np.save(config['emb_save_dir'], embs)\n",
     "\n",
-    "# define hyperparameters\n",
+    "# 定义超参\n",
     "config = {\n",
     "    \"class_num\": 128,  # output dimension\n",
     "    \"features_num\": 50,\n",
@@ -177,7 +177,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Run training process\n",
+    "## 执行训练过程\n",
     "\n",
     "\n",
     "在定义完训练过程和超参后，现在我们可以使用学习引擎和定义的超参开始训练过程。"
@@ -219,7 +219,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -233,9 +233,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.3"
+   "version": "3.9.6"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}