Skip to content
/ KGNET Public

A graph machine learning enabled engine (GML-Enabled)

Notifications You must be signed in to change notification settings

CoDS-GCS/KGNET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KGNET is to be published at ICDE-2023

KGNET - A GML-Enabled RDF Engine

kgnet_architecture

This vision paper proposes KGNet, an on-demand graph machine learning (GML) as a service on top of RDF engines to support GML-enabled SPARQL queries. KGNet automates the training of GML models on a KG by identifying a task-specific subgraph. This helps reduce the task-irrelevant KG structure and properties for better scalability and accuracy. While training a GML model on KG, KGNet collects metadata of trained models in the form of an RDF graph called KGMeta, which is interlinked with the relevant subgraphs in KG. Finally, all trained models are accessible via a SPARQL-like query. We call it a GML-enabled query and refer to it as SPARQLML. KGNet supports SPARQLML on top of existing RDF engines as an interface for querying and inferencing over KGs using GML models. The development of KGNet poses research opportunities in several areas, including meta-sampling for identifying task-specific subgraphs, GML pipeline automation with computational constraints, such as limited time and memory budget, and SPARQLML query optimization. KGNet supports different GML tasks, such as node classification, link prediction, and semantic entity matching. We evaluated KGNet using two real KGs of different application domains. Compared to training on the entire KG, KGNet significantly reduced training time and memory usage while maintaining comparable or improved accuracy. The KGNet source-code1 is available for further study

SPARQL-ML Demo Vedio

SPARQL-ML Demo
SAPRQL-ML Demo Colab notebook.

Installation

  • Clone the kgnet repo
  • Create kgnet Conda environment (Python 3.8) and install pip requirements.
  • Activate the kgnet environment
conda activate kgnet

Quickstart

Supported Features

  • Openlink-Virtuoso RDF Engine
  • Train GNN models
    • node classification methods (RGCN,Shadow-GNN,GraphSaint)
    • link prediction methods (RGCN,MorsE)
  • Inference for Seen Nodes
    • node classification methods (RGCN,GraphSaint) , more methods to be supported soon
    • node classification methods (RGCN) , more methods to be supported soon
  • SPARQL-ML queries
    • Single GML task per query (user defined predicate i.e. kgnet:types/NodeClassifier)
    • ( nested queries , unions , group by ) queries are not supported
  • RDF Engines
    • Open-Link Virtuosos
    • Stardog

KGNET Configration settings

  • Storage settings either (local file system, Remote File system, S3 storage)
    • Datasets path: path to the trained task sampled subgraph data
    • Inference data path : temp path to store inference subgraph data
    • trained model path: path to the trained task model file
    • Models storage manager API IP/Port : Restful API IP/Port
  • Inference API IP/Port: Restful API IP/Port
  • Data KG
    • Endpoint url
    • Named graph URI
  • KGMeta KG
    • Endpoint url
    • Named graph URI

1. Initializing KGNET

Step 1: Importing KGNET and setting up the paths.

Use the following code to import KGNET and set up the path where you want to store your datasets:

from KGNET.KGNET import KGNET
KGNET.KGNET_Config.datasets_output_path="/path/to/datasets/"
...

Note: It is suggested that you observe the default paths inside KGNET/Constants.py and configure them based on your preference.

Step 2: Create a KGNET instance and load your Knowledge Graph (KG).

A KGNET object contains all the necessary details about the KG. You can instantiate a KGNET object with the following example:

kgnet = KGNET(KG_endpointUrl="http://206.12.100.35:5820/kgnet_kgs/query",KGMeta_endpointUrl="http://206.12.100.35:5820/kgnet_kgs/query", KG_NamedGraph_IRI='https://dblp2022.org',RDFEngine=RDFEngine.stardog)

Note: The above arguments are for the demo scenario. You can replace them with your own KG.

2. Performing Node Classification on your KG

Step 1: Train a Node Classification (NC) model.

To train a NC model for your KG, you can call the train_GML function of the KGNET. It requires 4 required arguments.

operatorType = Operation type; Node Classification or Link Prediction.

GNNMethod = Method for the model,e.g. GraphSaint, RGCN, etc.

targetNodeType = Select the target node from your KG.

labelNodeType = the label node you want to predict.

Below is an example of training a model on the DBLP 2022 subgraph dataset.

kgnet.train_GML(operatorType=KGNET.GML_Operator_Types.NodeClassification,
                GNNMethod=KGNET.GNN_Methods.Graph_SAINT,
                targetNodeType="dblp2022:Publication",labelNodeType="dblp2022:publishedIn_Obj")

Note: Once your Model is trained, it is uploaded to the KGNET.

Step 2: Perform NC Inference on your KG.

Once you have trained your model, you can use it to perform inference on your KG. For this purpose, you can write a SPARQL query as shown in the example below:

query = """
            prefix dblp2022:<https://dblp.org/rdf/schema#>
            prefix kgnet:<http://kgnet/>
            select ?Publication ?Title ?dblp_Venue ?Pred_Venue
            from <https://dblp2022.org>
            where
            {
            ?Publication a dblp2022:Publication .
            ?Publication  dblp2022:publishedIn ?dblp_Venue .
            ?Publication  dblp2022:title ?Title .
            ?Publication ?NodeClassifier ?Pred_Venue .
            ?NodeClassifier a <kgnet:types/NodeClassifier>.
            ?NodeClassifier <kgnet:targetNode> dblp2022:Publication.
            ?NodeClassifier <kgnet:labelNode> dblp2022:publishedIn_Obj.
            }
            limit 100
        """

Once you have the query, you can execute it with the following function:

kgnet.executeSPARQLMLInferenceQuery(query)

This function runs the inference pipeline and returns the predictions along with stats associated with the inference.

2. Performing Link Prediction on your KG

The process of performing Link Prediction on your KG is almost identical to that of NC.

Step 1: Train a Link Prediction (LP) model.

To train a LP model for your KG, you can call the train_GML function of the KGNET. It requires 3 required arguments:

kgnet.train_GML(operatorType=KGNET.GML_Operator_Types.LinkPrediction,
                 targetEdge=TargetEdge,
                 GNNMethod=KGNET.GNN_Methods.MorsE)

operatorType = Operation type; Node Classification or Link Prediction GNNMethod = Method for the model,e.g. GraphSaint, RGCN, etc. targetEdge = Select the target edge you want to predict on your KG.

Step 2: Perform LP Inference on your KG.

Much like NC, you can write a query to perform inference on your choice of nodes as shown in the example below.

query = """         prefix dblp2022:<https://dblp.org/rdf/schema#>
                    prefix kgnet:<https://kgnet/>
                    select ?publication ?Title ?pred_author
                    from <https://dblp2022.org>
                    where {
                    ?publication a dblp2022:Publication.
                    ?publication dblp2022:title ?Title .
                    ?publication dblp2022:authoredBy ?auth .
                    ?publication ?LinkPredictor ?pred_author .
                    ?LinkPredictor  a <kgnet:types/LinkPredictor>.
                    ?LinkPredictor  <kgnet:targetEdge>  """+ "\""+TargetEdge+"\""+ """ .
                    ?LinkPredictor <kgnet:GNNMethod> "MorsE" .
                    ?LinkPredictor <kgnet:topK> 4.
                    }
                    order by ?publication
                    limit 300
            """

Once you have the query, you can run the executeSPARQLMLInferenceQuery command to perform the inference:

kgnet.executeSPARQLMLInferenceQuery(query)

3. Exploring KGMETA:

As KGMETA contains a variety of Tasks and each may be associated with multiple GNN models you can query the KGMETA. For a given taskId you can view the models trained on it using the following function:

kgnet.KGMeta_Governer.getGMLTaskModelsBasicInfoByID(taskId)

Create your local KGMeta KG

  • load the template KGMeta KG to your end point
  • add your KG graphs meta information to KGMeta
  • update KGNET.KGNETConfig with your KGMeta KG endpoint url and IRI to KGNet Conig object

Run your version GML-Inference API

  • Run GMLWebServiceApis.Inference_API.py instance and set you IP/Port
  • update KGNET.KGNETConfig with API url
  • Configure the Models and metadata files paths

Citing Our Work

If you find our work useful, please cite it in your research:

@INPROCEEDINGS{10184515,
  author={Abdallah, Hussein and Mansour, Essam},
  booktitle={2023 IEEE 39th International Conference on Data Engineering (ICDE)}, 
  title={Towards a GML-Enabled Knowledge Graph Platform}, 
  year={2023},
  volume={},
  number={},
  pages={2946-2954},
  doi={10.1109/ICDE55515.2023.00225}
}


@article{abdallah2023demonstration,
  title={Demonstration of SPARQL ML: An Interfacing Language for Supporting Graph Machine Learning for RDF Graphs},
  author={Abdallah, Hussein and Afandi, Waleed and Mansour, Essam},
  journal={Proceedings of the VLDB Endowment},
  volume={16},
  number={12},
  pages={3974--3977},
  year={2023},
  publisher={VLDB Endowment}
}

@article{KGTOSA,
  title={Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling},
  author={Abdallah, Hussein and Afandi, Waleed and Kalnis, Panos and Mansour, Essam},
  publisher={CORR}
}

Publicity

This repository is part of our submission to ICDE-2023.

Questions

For any questions please contact us at:
[email protected], [email protected], [email protected]