Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enh] Extended attribute selection for graph aggregation #24

Merged
merged 4 commits into from
Nov 9, 2023

Conversation

kohlerca
Copy link
Collaborator

@kohlerca kohlerca commented Nov 9, 2023

This PR adds two features to the existing provenance graph aggregations:

  1. Previously, for nodes with a given label, it was not possible to use an attribute that was not present in all nodes (an exception was raised). It is now possible to specify attributes that are not shared among all nodes. If that attribute is not present, it will be assumed as None when evaluating the grouping.

  2. In addition to strings with node attribute names, it is possible to pass callables that take the form callable(graph, node, data). The callable can then access the NetworkX graph object, the node id, and the full node attributes dictionary. With this enhancement, it is possible to consider transformations of attribute values or the relationships of nodes in the graph when grouping.

For example, if trying to group files (label File) using the File_path attribute captured by Alpaca, a supernode in the aggregation would be created for every different file path when using the attribute name:

aggregated = graph.aggregate({'File': ('File_path',)})

With a callable, it is possible to find other similarities to group the nodes, such as the start of the path string in File_path:

is_output_file = lambda graph, node, data: data['File_path'].startswith("/outputs/")
aggregated = graph.aggregate({'File': (is_output_file,)})

@kohlerca kohlerca added the enhancement New feature or request label Nov 9, 2023
@kohlerca kohlerca merged commit 12f2e4d into main Nov 9, 2023
5 checks passed
@kohlerca kohlerca deleted the enh/aggregation_attributes branch November 9, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant