forked from liufengyun/hashdiff
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged with the original to update the gem, but keep our customizations #1
Open
grumpit
wants to merge
45
commits into
EntropyZero:master
Choose a base branch
from
ARPC:merged
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ci skip]
These vars were already created at line 13 but they weren't used before.
Use a_start and b_start variables in HashDiff.lcs
A hash key with anything not matching \w (word characters) like -|, etc would not match the regex detecting arrays in the patch path and would not patch correctly.
Fix bug with array under hash key with non-word characters.
fix an error when a hash has mixed types
- Add 2.2, 2.3 and 2.4 to Travis CI. - Modern version of `rake` fails with old RSpec so its version should be restricted. - Fixnum is deprecated in Ruby 2.4.0 so replace it with Integer.
New rubies support.
Mention 2 compelling reasons to start using HashDiff
HashDiff#similar? compares values based on "node count" using HashDiff#count_nodes and HashDiff#diff (if the counts don't cancel out). If `a` and `b` are arrays/hashes `count_nodes` recursively counts the elements, otherwise returns 1. If `a` and `b` are not arrays/hashes, HashDiff#similar? needlessly counts/diffs values that will ultimately end up passing through HashDiff#compare_values. A considerable performance improvement can be had by circumventing the needless recursion and call HashDiff#compare_values if `a` and `b` are not arrays/hashes. This has been benchmarked with this snippet: ```ruby $LOAD_PATH << File.join(File.dirname(__FILE__), 'lib') require 'hashdiff' require 'benchmark' seq1 = %w(a b c e h j l m n p) seq2 = %w(a b c d e f j k l m r s t) n = 10000 Benchmark.bm do |x| x.report('lcs') { n.times do ; HashDiff.lcs(seq1, seq2); end } end ``` Before: ```sh $ ruby benchmark.rb user system total real lcs 6.640000 0.010000 6.650000 ( 6.649822) ``` After: ```sh $ ruby benchmark.rb user system total real lcs 1.030000 0.010000 1.040000 ( 1.037542) ```
Greatly improve performance of HashDiff#similar?
add codecov to show coverage
The patches suggested in the README only work when you have string keys on the hash.
This introduces an array_path option that can be used when generating a diff. This represents the path to aspects of the diff as an array rather than a string. eg. ``` x = {'a' => 1} y = {'a' => 2} HashDiff.diff(x, y) => [["~", "a", 1, 2]]h HashDiff.diff(x, y, :array_path => true) => [["~", ["a"], 1, 2]] ``` This allows there to be more flexibility with the types used as keys in a hash. Allowing workarounds for issues such as: liufengyun#25 eg ``` x = {'a'=>1} y = {:a=>1} HashDiff.diff(x, y) => [["-", "a", 1], ["+", "a", 1]] HashDiff.diff(x, y, :array_path => true) => [["-", ["a"], 1], ["+", [:a], 1]] ``` And improved ability to patch hashes with keys: eg ``` x = {a: {b: :c}} y = {a: {b: :d}} diff = HashDiff.diff(x, y) => [["~", "a.b", :c, :d]] HashDiff.patch!(x, diff) NoMethodError: undefined method `[]=' for nil:NilClass diff = HashDiff.diff(x, y, array_path: true) => [["~", [:a, :b], :c, :d]] HashDiff.patch!(x, diff) => {:a=>{:b=>:d}} ``` This updates the `patch!` and `unpatch!` methods to accept diffs with either paths as strings or as arrays.
Introduce an array_path option
The LCS algorithm produces excellent diffs. Unfortunately it has a complexity, running at n^2 for the number of items in an array - which can lead to extremely slow computations. ``` > require 'hashdiff';require 'benchmark' > x = (1..100).map { |i| { key: i, foo: :bar } } > puts Benchmark.measure { HashDiff.diff(x, x) } 0.420000 0.000000 0.420000 ( 0.430212) ``` If the size of the array is 1000 then we see it get really painful ``` > x = (1..1000).map { |i| { key: i, foo: :bar } } > puts Benchmark.measure { HashDiff.diff(x, x) } 42.680000 0.590000 43.270000 ( 43.530287) ``` This commit introduces an option to sacrifice the quality of the diff for a faster computational result with the `use_lcs` option, which can be set to false to disable use of the LCS algorithm. With `use_lcs` as false the array comparison is much simpler with a complexity of at worst 2n for an array. ``` > x = (1..100).map { |i| { key: i, foo: :bar } } > puts Benchmark.measure { HashDiff.diff(x, x, use_lcs: false) } 0.010000 0.000000 0.010000 ( 0.004894) > x = (1..1000).map { |i| { key: i, foo: :bar } } > puts Benchmark.measure { HashDiff.diff(x, x, use_lcs: false) } 0.040000 0.000000 0.040000 ( 0.042547) ``` The linear approach to comparing the array works on the basis that if arrays are the same length it treats the array as having no additions or deletions, only changes. ``` > HashDiff.diff([0,1,2], [3,4,5], use_lcs: false) => [["~", "[0]", 0, 3], ["~", "[1]", 1, 4], ["~", "[2]", 2, 5]] ``` compared to: ``` > HashDiff.diff([0,1,2], [3,4,5]) => [["-", "[2]", 2], ["-", "[1]", 1], ["-", "[0]", 0], ["+", "[0]", 3], ["+", "[1]", 4], ["+", "[2]", 5]] ``` Whereas if there are more items in one array than the other it checks the items surrounding the index for a match to calculate additions and removals. ``` > HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5], use_lcs: false) => [["+", "[1]", 1], ["+", "[2]", 2], ["+", "[4]", 4]] > HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5], use_lcs: false) == HashDiff.diff([0, 3, 5], [0, 1, 2, 3, 4, 5]) => true ``` For a combination of added and changed items the diff will appear different to the lcs approach: ``` > HashDiff.diff([0, 1, 2], [0, 2, 2, 3], use_lcs: false) => [["~", "[1]", 1, 2], ["+", "[3]", 3]] > HashDiff.diff([0, 1, 2], [0, 2, 2, 3]) => [["-", "[1]", 1], ["+", "[1]", 2], ["+", "[3]", 3]] ``` However all diffs produce same results through `patch!` and `unpatch!` methods.
Option to allow array comparisons in linear complexity
Documentation for the :use_lcs option
update minimum ruby to reflect actual support
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.