[Enh] Extended attribute selection for graph aggregation #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds two features to the existing provenance graph aggregations:
Previously, for nodes with a given label, it was not possible to use an attribute that was not present in all nodes (an exception was raised). It is now possible to specify attributes that are not shared among all nodes. If that attribute is not present, it will be assumed as None when evaluating the grouping.
In addition to strings with node attribute names, it is possible to pass callables that take the form
callable(graph, node, data)
. The callable can then access the NetworkX graph object, the node id, and the full node attributes dictionary. With this enhancement, it is possible to consider transformations of attribute values or the relationships of nodes in the graph when grouping.For example, if trying to group files (label
File
) using theFile_path
attribute captured by Alpaca, a supernode in the aggregation would be created for every different file path when using the attribute name:With a callable, it is possible to find other similarities to group the nodes, such as the start of the path string in
File_path
: