Skip to content
edubkendo edited this page Sep 8, 2013 · 2 revisions

Semantic Diffing, WTF?

Typically, when a tool creates a diff, it goes through the text, comparing the text line by line. This can tell us a lot, but when changes get complex, it often fails to match those changes back to the original lines in the source code. For instance, consider the scenario where PersonA has moved a method on their branch, while PersonB has changed something internal to that method. Most diff/merge tools would absolutely require human intervention here.

NodeDiff, however, looks at the AST instead of the text, and so its able to match those methods back up as long as the method name is the same.

Usage

NodeDiff compares two Node objects. Typically, these would be Nodes representing the original and current state of the AST. The simplest way to use would be to simply create a NodeDiff object with the two nodes, and then create a diff. This returns an ArrayList of Change objects. A change object will hold a reference to both the old and new versions of the Node which was altered, and some additional information about node complexity.

NodeDiff nd = new NodeDiff(nodeA, nodeB);
ArrayList<Change> diff = nd.getDiff();

This is the simple version of the diff, and for many purposes it may be all that you need. However, NodeDiff can also create a DeepDiff. DeepDiffs attach a subdiff of each of the Change objects, to the Change objects. Using Levenshtein distances, and relying on the fact that there are less nodes to compare against, these subdiffs make a second attempt to match up nodes which have moved or been changed too dramatically for the original diff to recognize. In order to do this, NodeDiff needs the strings which the Node objects were parsed from.

NodeDiff nd = new NodeDiff(nodeA, codeStringA, nodeB, codeStringB);
ArrayList<DeepDiff> deepdiff = nd.getDeepDiff();

Careful use of this subdiff information can allow you to be much more precise. For an example of a project which uses this information in a powerful way, check out RubyDiff .

Finally, it may be that you don't want NodeDiff to compare every node. NodeDiff provides for that via IsJunk. If you pass NodeDiff an object which implements the IsJunk interface and, specifically, the #checkJunk() method, NodeDiff will call this callback on the nodes before comparing them. Any node for which #checkJunk() returns true will be skipped over.

String codeStringA = "def foo(bar)\n bar\n end\n foo('astring')";
String codeStringB = "'a'\ndef bloo(bar)\n puts bar\n end\n foo('astring')";

//The implementation of parseContents is left to the reader.
Node nodeA = parseContents(codeStringA);
Node nodeB = parseContents(codeStringB);

NodeDiff nd = new NodeDiff(nodeA, codeStringA, nodeB, codeStringB, new
IsJunk() {
  public boolean checkJunk(Node node) {
      if (!(node instanceof ILocalScope && !(node instanceof RootNode))) {
          return true;
      }
        return false;
    }
});

ArrayList<DeepDiff> diff = nd.getDeepDiff();

System.out.println(diff);
 // Output:
 //[
 // Change:
 // New Node: (DefnNode:foo, (MethodNameNode:foo), (ArgsNode, (ListNode, (
 // ArgumentNode:bar))), (NewlineNode, (LocalVarNode:bar))) Complexity: 7
 // Position: <code>:[0,2]:[0,22]
 //,
 // Change:
 // Old Node: (DefnNode:bloo, (MethodNameNode:bloo), (ArgsNode, (ListNode, (
 // ArgumentNode:bar))), (NewlineNode, (FCallNode:puts, (ArrayNode, (
 // LocalVarNode:bar))))) Complexity: 9 Position: <code>:[1,3]:[4,32]
 //]

In the above example, we are instructing NodeDiff to skip any node which isn't an instance of ILocalScope. Essentially, we are saying: if it isn't a class, a module, or a method skip it.

More Examples

These are examples of NodeDiff being used in the wild:

Clone this wiki locally