You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In preparation for some future work, it would be convenient to be able to calculate the hash of a connected graph, assuming no directivity in edges. That is, if we query for all nodes connected to a node using django-postgresql-dag's connected_graph() method, we should be able to calculate a hash value of the current state of all connected elements. While potentially expensive to compute †, this should allow guaranteed identification of changes to the graph.
It is possible to have multiple DAGs using django-postgresql-dag. For this issue, we want to calculate the hash of a particular DAG, not all nodes/graphs within the project. To do this, we might approach the problem in the following manner:
Initially treat the DAG as an undirected graph (which is done with the connected_graph() query) to retrieve all nodes in the current graph.
For each identified node within the QuerySet, annotate the node with a sorted list of the parent id values of that node.
Sort the QuerySet by node id.
Convert the resulting QuerySet to a tuple of tuples (which contains the id and parents for each node).
Iterate through the tuple, calculating the combined hash all elements.
Return the calculated hash value.
Recommendations for improvement are welcome. There's likely a more memory-friendly approach that will still get the job done.
Alternative approaches considered: NetworkX Provides a weisfeiler_lehman_graph_hash() function, but it doesn't work well on graphs that are not isomorphic. The graphs produced with django-postgresql-dag are likely to be non-isomorphic.
† Potentially expensive in time and memory. The tuple we use to calculate the hash contains the id of every node in the graph, and for each node, the ids of the parent nodes. Using hashllib's blake2s hash will probably be a speedy approach, but the entire process of querying, converting, and iterating to calculate the hash will take a certain amount of time.
The text was updated successfully, but these errors were encountered:
The graphs produced with django-postgresql-dag are likely to be non-isomorphic.
What is meant by this sentence? isomorphism AFAIK is a property of a pair of graphs ( graph X is isomorphic to graph Y ), and not a stand alone property of a graph.
In preparation for some future work, it would be convenient to be able to calculate the hash of a connected graph, assuming no directivity in edges. That is, if we query for all nodes connected to a node using django-postgresql-dag's
connected_graph()
method, we should be able to calculate a hash value of the current state of all connected elements. While potentially expensive to compute †, this should allow guaranteed identification of changes to the graph.It is possible to have multiple DAGs using django-postgresql-dag. For this issue, we want to calculate the hash of a particular DAG, not all nodes/graphs within the project. To do this, we might approach the problem in the following manner:
connected_graph()
query) to retrieve all nodes in the current graph.id
values of that node.Recommendations for improvement are welcome. There's likely a more memory-friendly approach that will still get the job done.
Alternative approaches considered: NetworkX Provides a
weisfeiler_lehman_graph_hash()
function, but it doesn't work well on graphs that are not isomorphic. The graphs produced with django-postgresql-dag are likely to be non-isomorphic.† Potentially expensive in time and memory. The tuple we use to calculate the hash contains the
id
of every node in the graph, and for each node, theid
s of the parent nodes. Using hashllib's blake2s hash will probably be a speedy approach, but the entire process of querying, converting, and iterating to calculate the hash will take a certain amount of time.The text was updated successfully, but these errors were encountered: