Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern De-Duplication based on Subsequence Detection #1031

Merged
merged 3 commits into from
Oct 23, 2023

Conversation

miles-grant-ibigroup
Copy link
Collaborator

Description:

Possibly a slightly overbuilt way to use subsequence detection to remove patterns that are simply a subsequence of another, larger pattern.

There are definitely a few optimizations possible here. I am looking for feedback on some of these algorithms and if there's a way to make the more efficient.

I believe this should be possible in O(n) time, right now it's O(n^2) although reversing the sorted array should help remove the most useless of these comparisons

Copy link
Contributor

@amy-corson-ibigroup amy-corson-ibigroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think of this possible cleanup? Approving because this does work as-is!!

Comment on lines +665 to +679
.filter((pattern) => {
// Compare to all other patterns TODO: make this beat O(n^2)
return !patternsSortedByLength.find((p) => {
// Don't compare against ourself
if (p.id === pattern.id) return false

// If our pattern is longer, it's not a subset
if (p.stops.length < pattern.stops.length) return false

return isValidSubsequence(
p.stops.map((s) => s.id),
pattern.stops.map((s) => s.id)
)
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Suggested change
.filter((pattern) => {
// Compare to all other patterns TODO: make this beat O(n^2)
return !patternsSortedByLength.find((p) => {
// Don't compare against ourself
if (p.id === pattern.id) return false
// If our pattern is longer, it's not a subset
if (p.stops.length < pattern.stops.length) return false
return isValidSubsequence(
p.stops.map((s) => s.id),
pattern.stops.map((s) => s.id)
)
})
})
.filter((pattern, patternIndex) => {
// Compare to all other patterns larger than the current pattern
return !patternsSortedByLength.find((p, pIndex) => {
if (pIndex >= patternIndex) return false
return isValidSubsequence(
p.stops.map((s) => s.id),
pattern.stops.map((s) => s.id)
)
})
})

You're already sorting by length so checking anything above a certain index number feels like a waste of time, and you could remove the if (p.id === pattern.id) block because p and pattern should have the same index. This gets the same result for 1-line patterns but might need more testing!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check the pattern id because the indexes don't always line up. I agree the length check is not required but to do that we need to start checking the second array at the right index and that logic is currently not present

Copy link
Contributor

@daniel-heppner-ibigroup daniel-heppner-ibigroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat algorithm, I don't see a way to make it faster immediately. Good start!

@miles-grant-ibigroup miles-grant-ibigroup merged commit c628631 into dev Oct 23, 2023
@miles-grant-ibigroup miles-grant-ibigroup deleted the pattern-de-duplication branch October 23, 2023 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants