Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sorted generic v2 #16

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

koodeex
Copy link
Contributor

@koodeex koodeex commented Sep 11, 2022

Hello!

I have some feature proposal, that can increase perforamance for sorted array case for few times.

Idea is about changing places for elements of the left array and iterating on both arrays at the same time, we can win a lot of perforamnce cause it makes complexity O(n + x) where n is length of the shortest array and x duplicate cases in the longest array.

n+x always less or equal length of the longest array.

It can be unsafe in case when its important to save order of input. Also it will find duplate matches only for the shortest array too.

Usage example:

	a = []int{1, 2}
	b = []int{2}
	s = SortedGenericV2(a, b, func(i, j int) bool { return a[i] > b[j] })

Output:

[2]

New a's order:

[2, 1]

vanila function benchmarks:

Name Runs Average Allocated Allocations from heap
BenchmarkSortedGeneric/SortedGeneric_-_1-16 41 944 108 28.25 ns/op 8 B/op 1 allocs/op
BenchmarkSortedGeneric/SortedGeneric_-_10-16 10 445 590 111.6 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGeneric_-_100-16 745 248 1 624 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGeneric_-_1000-16 38 613 31 097 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGeneric_-_10000-16 3 799 308 313 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGeneric_-_100000-16 327 3 679 350 ns/op 0 B/op 0 allocs/op

proposal benchmarks:

Name Runs Average Allocated Allocations from heap
BenchmarkSortedGeneric/SortedGenericV2_-_1-16 199 261 370 5.992 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGenericV2_-_10-16 20 496 318 57.31 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGenericV2_-_100-16 1 909 890 626.2 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGenericV2_-_1000-16 188 733 6 367 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGenericV2_-_10000-16 9 190 134 617 ns/op 0 B/op 0 allocs/op
BenchmarkSortedGeneric/SortedGenericV2_-_100000-16 1 537 783 528 ns/op 0 B/op 0 allocs/op

speed diffs:

size Average diff, ns/op Average diff, %
1 -22,258 ns/op -78,78%
10 -54,29 ns/op -48,64%
100 -997,8 ns/op -61,45%
1 000 -24 730 ns/op -99,97%
10 000 -173 696 ns/op -56,33%
100 000 -2 895 822 ns/op -78,70%

As you see, with a little risks (like losing order of input array and weird behaviour for a lot of duplicate matches) you can get a huge performance boost.

P.S. alogorythm idea is not mine, my collegue (@painkuter) shared it with me.

@koodeex
Copy link
Contributor Author

koodeex commented Oct 15, 2022

Gentle ping

@juliangruber
Copy link
Owner

I haven't had the time to properly look at this PR yet. Maybe some of the previous contributors are interested in reviewing?

@3schwartz @colynn @cmin764 @kalashnikovisme

Copy link
Contributor

@3schwartz 3schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and sorry for the late response.

IMHO having functions with side effects isn't great - if one has ordered arrays and they loose ordering because of a call to intersect it is confusing.

Instead if I were you I would create a new method - one which doesn't expect ordering. Then within the method sort those two arrays (by copying) and then use this algorithm,

It will bring a lot more memory overhead, but I think the usage would be more realistic and hence the benchmark would give a better interpretation of the real gain.

// Best case complexity: O(n) where n is length of the shortest array (all values unique)
// Worst case complexity: O(n) where n is length of the longest array (all values of the longest array are duplicates of intersect match)
// Warning: Function will change left array order
func SortedGenericV2[T comparable](a []T, b []T, leftGreater Comparator) []T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one always should use a "leftgreater" comparer, then I would give this as a parameter.

// SortedGenericV2 has complexity: O(n + x) where n is length of the shortest array and x duplicate cases in the longest array.
// Best case complexity: O(n) where n is length of the shortest array (all values unique)
// Worst case complexity: O(n) where n is length of the longest array (all values of the longest array are duplicates of intersect match)
// Warning: Function will change left array order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm not fan of side effects - if one has a sorted array, then loosing ordering because of a call to a intersect would be suppressing in my humble opinion.

@@ -63,6 +64,39 @@ func TestHash(t *testing.T) {
assert.Equal(t, s, []interface{}{2})
}

func TestSortedGenericV2(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like two test cases - Please create a data struct which each test and then loop through them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants