Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation bugs in statistics.ts #98

Open
jcberquist opened this issue Sep 14, 2023 · 0 comments
Open

Calculation bugs in statistics.ts #98

jcberquist opened this issue Sep 14, 2023 · 0 comments

Comments

@jcberquist
Copy link

There are a couple of issues that we have noticed in the statistics calculations int the statistics.ts utility file, specifically in the mergeWordIntervals function. First in the code path where the reducer checks for what it calls "same start inclusion":

    // same start inclusion
    // [---p---]  or
    // [--n--]
    else if (next.start === prev.start && next.end < prev.end) {
      if (next.type < prev.type) {
        joined[joined.length - 1] = next;
        joined.push({ ...prev, start: next.start });
      }
    }

Specifically on the joined.push the start of the interval is set to next.start but I think that should be set to next.end. As it stands now, the two intervals will overlap with each other.

Secondly, there is one code path where two new intervals are pushed onto the stack at the same time, in the "complete inclusion" path:

    // complete inclusion
    // [---p---]
    //  [--n--]
    else if (prev.start < next.start && next.end < prev.end) {
      if (next.type < prev.type) {
        joined[joined.length - 1] = { ...prev, end: next.start };
        joined.push(next);
        joined.push({ ...prev, start: next.end });
      }
    }

Effectively what this path does is it takes the last interval in the joined array and the next interval passed in by the reducer, and converts them to three sequential non-overlapping intervals. However, since the reducer only ever compares the last item in the joined array to the next item coming in, there are scenarios where the interval on the next pass will start before the last item on the joined array (and overlap with the second to last item in the joined array), with the result that when that next interval is compared to its previous, it will just be pushed onto the array. For example, consider the following sequence of intervals:

[----p----]
   [--n--]
   [---f---]

After the first loop (assuming p is in the joined array, and n is next) you have:

[-]
   [--n--]
          [-]
   [---f---]

But since the next item (f) in this case now starts before the last item on the joined array (and overlaps with the second to last item in the array), there is no code path for that situation, and it hits the else block, and is pushed onto the joined array as is. Again, this means that matches can overlap even after a pass through the mergeWordIntervals function.

These overlapping intervals result in calculations of match percentage that are erroneously high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant