Skip to content
MariaVincent edited this page Sep 25, 2017 · 8 revisions

Alignment

(9/25/2017) Alignment is the process of merging a new subgraph generated by rt with the full knowledge graph. The alignment code receives all content from a single source as a single graph. For example, when STUCCO loads content from a CVE source file all the content is transformed into a graphson structure and passed to alignment. The alignment code "assumes" string it receives is a subgraph in the form of Graphson. Alignment handles the merging of new content as individual vertices and edges without taking into account any topology/connectivity.

There are two broad categories of alignment:

  1. Merging new nodes that have canonical names / unique IDs (e.g. CVE #):
  • If a matching canonical name is not found in the knowledge graph, add the node.
  • If a matching canonical name is found in the knowledge graph, merge properties and merge edges.
  1. Merging nodes without canonical names / unique IDs (e.g. malware). Some of these nodes may not have a canonical name, others may have a canoncial name but it is not available:

Development Activities

This section documents the state the alignment code and what future activities are planned.

Of the two broad categories for alignment our first instantiation is only of the first category. Here are the steps:

  1. Using only the vertices first:
    • Each vertex’s unique ID is searched for within the knowledge graph (i.e., Titan).
    • If no vertex is found, then this vertex is created within the Knowledge graph.
    • If a vertex is found then the properties are “merged” with the vertex in the knowledge graph. Properties that were not present are added and existing properties are appended to, overridden, or retained if they are newer than the vertex being merged. The decisions for aligning vertices’ properties are associated with the property type which is encoded into the graphson schema which is associated with the ontology definition. There are currently four types of merge methods: keepNew, appendList, keepUpdates, and keepConfidence.
  2. Once all the vertices have been added the edges can be added.
    • Note, vertices must be added first or the new edges won’t find the vertices within the knowledge graph.
    • Using the edge definition (i.e., which vertices ID’s define an edge) we look for incoming and outgoing vertices as defined in the knowledge graph.
    • If an edge’s definition can’t find all the vertices, an error is logged and moves to the next edge.
    • When the respective vertices are found the process then creates a property map for the edge and adds the edge properties to that map, finally committing that edge to the knowledge graph. NOTE: If an edge does exist we are not performing alignment with it, which may/will create duplicate edges.

KNOWNS:

  • To perform alignment with Titan we go through the Rexster web interface and use Gremlin as a means to acquire and submit content to Titan.
Clone this wiki locally