-
Notifications
You must be signed in to change notification settings - Fork 7
alignment
(9/25/2017) Alignment is the process of merging a new subgraph generated by rt with the full knowledge graph. The alignment code receives all content from a single source as a single graph. For example, when STUCCO loads content from a CVE source file all the content is transformed into a graphson structure and passed to alignment. The alignment code "assumes" string it receives is a subgraph in the form of Graphson. Alignment handles the merging of new content as individual vertices and edges without taking into account any topology/connectivity.
There are two broad categories of alignment:
- Merging new nodes that have unique names / IDs (e.g. CVE #):
- If a matching name / ID is not found in the knowledge graph, add the node.
- If a matching name / ID is found in the knowledge graph, merge properties and merge edges.
- Merging nodes without names / IDs (e.g. malware). Some of these nodes may not have a name, others may have a name but it is not available:
- Identify equivalent nodes and score the confidence that the two nodes refer to the same domain concept.
- if a suitable match is found, merge properties and merge edges as above.
- if a suitable match is not found, add the new node, and merge edges as above if needed.
This section documents the state the alignment code and what future activities are planned.
Of the two broad categories for alignment our first instantiation is only of the first category. Here are the steps:
- Using only the vertices first:
- Each vertex’s unique name / ID is searched for within the knowledge graph (i.e., unique ID in case of Titan and unique name / alias in case of Postgres).
- If no vertex is found, then this vertex is created within the Knowledge graph.
- If a vertex is found then the properties are “merged” with the vertex in the knowledge graph. Properties that were not present are added and existing properties are appended to, overridden, or retained if they are newer than the vertex being merged. The decisions for aligning vertices’ properties are associated with the property type which is encoded into the Stucco ontology definition. There are currently four types of merge methods: keepNew, appendList, keepUpdates, and keepConfidence. Note: There are two types of merge methods for Postgres version: appendList and keepUpdates.
- Once all the vertices have been added the edges can be added.
- Note, vertices must be added first or the new edges won’t find the vertices within the knowledge graph.
- Using the edge definition (i.e., which vertices ID’s define an edge) we look for incoming and outgoing vertices as defined in the knowledge graph.
- If an edge’s definition can’t find all the vertices, an error is logged and moves to the next edge.
- When the respective vertices are found the process then creates a property map for the edge and adds the edge properties to that map, finally committing that edge to the knowledge graph. NOTE: If an edge does exist we are not performing alignment with it, which may/will create duplicate edges.
KNOWNS:
- To perform alignment with Titan we go through the Rexster web interface and use Gremlin as a means to acquire and submit content to Titan.
- To perform alignment with Postgres we load alignment rules written in PL/pgSQL into database during initialization, and all alignment process is occurring inside of a database.