Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged with the original to update the gem, but keep our customizations #1

Open
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

grumpit
Copy link

@grumpit grumpit commented Jul 21, 2018

No description provided.

liufengyun and others added 30 commits October 6, 2014 20:59
These vars were already created at line 13 but they weren't used before.
Use a_start and b_start variables in HashDiff.lcs
A hash key with anything not matching \w (word characters) like -|, etc
would not match the regex detecting arrays in the patch path and would
not patch correctly.
Fix bug with array under hash key with non-word characters.
- Add 2.2, 2.3 and 2.4 to Travis CI.
- Modern version of `rake` fails with old RSpec so its version should be restricted.
- Fixnum is deprecated in Ruby 2.4.0 so replace it with Integer.
Mention 2 compelling reasons to start using HashDiff
HashDiff#similar? compares values based on "node count" using
HashDiff#count_nodes and HashDiff#diff (if the counts don't cancel out). If
`a` and `b` are arrays/hashes `count_nodes` recursively counts the elements,
otherwise returns 1. If `a` and `b` are not arrays/hashes, HashDiff#similar?
needlessly counts/diffs values that will ultimately end up passing
through HashDiff#compare_values.

A considerable performance improvement can be had by circumventing the
needless recursion and call HashDiff#compare_values if `a` and `b` are
not arrays/hashes.

This has been benchmarked with this snippet:

```ruby
$LOAD_PATH << File.join(File.dirname(__FILE__), 'lib')
require 'hashdiff'
require 'benchmark'

seq1 = %w(a b c e h j l m n p)
seq2 = %w(a b c d e f j k l m r s t)

n = 10000
Benchmark.bm do |x|
   x.report('lcs') { n.times do ; HashDiff.lcs(seq1, seq2); end }
end
```

Before:

```sh
$ ruby benchmark.rb
       user     system      total        real
lcs  6.640000   0.010000   6.650000 (  6.649822)
```

After:

```sh
$ ruby benchmark.rb
       user     system      total        real
lcs  1.030000   0.010000   1.040000 (  1.037542)
```
Greatly improve performance of HashDiff#similar?
The patches suggested in the README only work when you have string keys
on the hash.
This introduces an array_path option that can be used when generating a
diff. This represents the path to aspects of the diff as an array rather
than a string.

eg.
```
x = {'a' => 1}
y = {'a' => 2}

HashDiff.diff(x, y)
=> [["~", "a", 1, 2]]h

HashDiff.diff(x, y, :array_path => true)
=> [["~", ["a"], 1, 2]]
```

This allows there to be more flexibility with the types used as keys in
a hash. Allowing workarounds for issues such as:
liufengyun#25

eg
```
x = {'a'=>1}
y = {:a=>1}
HashDiff.diff(x, y)
=> [["-", "a", 1], ["+", "a", 1]]
HashDiff.diff(x, y, :array_path => true)
=> [["-", ["a"], 1], ["+", [:a], 1]]
```

And improved ability to patch hashes with keys:

eg
```
x = {a: {b: :c}}
y = {a: {b: :d}}
diff = HashDiff.diff(x, y)
=> [["~", "a.b", :c, :d]]
HashDiff.patch!(x, diff)
    NoMethodError: undefined method `[]=' for nil:NilClass
diff = HashDiff.diff(x, y, array_path: true)
=> [["~", [:a, :b], :c, :d]]
HashDiff.patch!(x, diff)
=> {:a=>{:b=>:d}}
```

This updates the `patch!` and `unpatch!` methods to accept diffs with
either paths as strings or as arrays.
kevindew and others added 15 commits August 3, 2017 22:49
s/comparisions/comparisons
The LCS algorithm produces excellent diffs. Unfortunately it has a
complexity, running at n^2 for the number of items in an array - which
can lead to extremely slow computations.

```
> require 'hashdiff';require 'benchmark'
> x = (1..100).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x) }
  0.420000   0.000000   0.420000 (  0.430212)
```

If the size of the array is 1000 then we see it get really painful

```
> x = (1..1000).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x) }
 42.680000   0.590000  43.270000 ( 43.530287)
```

This commit introduces an option to sacrifice the quality of the diff
for a faster computational result with the `use_lcs` option, which can
be set to false to disable use of the LCS algorithm.

With `use_lcs` as false the array comparison is much simpler with a
complexity of at worst 2n for an array.

```
> x = (1..100).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x, use_lcs: false) }
  0.010000   0.000000   0.010000 (  0.004894)
> x = (1..1000).map { |i| { key: i, foo: :bar } }
> puts Benchmark.measure { HashDiff.diff(x, x, use_lcs: false) }
  0.040000   0.000000   0.040000 (  0.042547)
```

The linear approach to comparing the array works on the basis that if
arrays are the same length it treats the array as having no additions or
deletions, only changes.

```
> HashDiff.diff([0,1,2], [3,4,5], use_lcs: false)
=> [["~", "[0]", 0, 3], ["~", "[1]", 1, 4], ["~", "[2]", 2, 5]]
```

compared to:

```
> HashDiff.diff([0,1,2], [3,4,5])
=> [["-", "[2]", 2], ["-", "[1]", 1], ["-", "[0]", 0], ["+", "[0]", 3], ["+", "[1]", 4], ["+", "[2]", 5]]
```

Whereas if there are more items in one array than the other it checks
the items surrounding the index for a match to calculate additions and
removals.

```
> HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5], use_lcs: false)
=> [["+", "[1]", 1], ["+", "[2]", 2], ["+", "[4]", 4]]
> HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5], use_lcs: false) == HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5])
=> true
```

For a combination of added and changed items the diff will appear
different to the lcs approach:

```
> HashDiff.diff([0, 1, 2], [0, 2, 2, 3], use_lcs: false)
=> [["~", "[1]", 1, 2], ["+", "[3]", 3]]
> HashDiff.diff([0, 1, 2], [0, 2, 2, 3])
=> [["-", "[1]", 1], ["+", "[1]", 2], ["+", "[3]", 3]]
```

However all diffs produce same results through `patch!` and `unpatch!`
methods.
Option to allow array comparisons in linear complexity
update minimum ruby to reflect actual support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.