Skip to content
This repository has been archived by the owner on Nov 13, 2021. It is now read-only.

Question regarding ForwardUpdate #18

Open
torfsen opened this issue Oct 23, 2015 · 3 comments
Open

Question regarding ForwardUpdate #18

torfsen opened this issue Oct 23, 2015 · 3 comments

Comments

@torfsen
Copy link

torfsen commented Oct 23, 2015

I'm currently going through the source and I'm having trouble to understand how interval tree A is updated in ForwardUpdate: As far as I understand it, the idea is to move the location candidate one step to the right (Line 198) and then update the trees by removing distances related to the point on the left which is now outside the window and by adding distances related to the point on the right which is now inside the window.

Since the window for tree A contains min_size points, I think we should add and remove min_size - 1 distances from tree A, respectively. However, as far as I understand lines 201-219, the code actually does the following:

  • Add the min_distance - 1 distances from Z[tau1 - min_size] - Z[tau1 - 1] through Z[tau1 - 2] - Z[tau1 - 1] (inclusive). So far so good.
  • Remove the min_distance (!) distances from Z[tau1 - min_size] - Z[tau1 - min_size - 1] through Z[tau1 - 1] - Z[tau1 - min_size - 1] (inclusive). I don't think that last distance should be removed here, since Z[tau1 - 1] is the new point and Z[tau1 - min_size - 1] is the point that was just dropped.
  • Add the single distance Z[tau1 - min_size - 1] - Z[tau1 - min_size]. I don't think that's correct: this is a distance related to the point we just dropped, so it should be removed. In addition, we just did remove it in the previous step!

I therefore think that the loop in line 208 should have the limit i < tau1 - 1 (just as the loop before it) and that the addition of the single distance after it (lines 215-219) should be removed.

Is this reasoning correct or did I miss something?

@putnam120
Copy link
Contributor

Sorry about the delayed response. I've been rather busy these last few
weeks.

I agree with you analysis in the first two points. However, as mentioned in
the third we add back the point that was previously (incorrectly) removed.
So I don't see where the issue is. I think the reason it was implemented
this way was for ease of understanding. If the loop limits were changed it
would result in a small computational improvement but it would make
matching up the code and the corresponding parts of the research paper more
difficult.

It has been a while since I've looked at this project but I am pretty sure
that was our reasoning. Hopefully this was able to answer your question.

On Fri, Oct 23, 2015 at 2:12 PM, Torf [email protected] wrote:

I'm currently going through the source and I'm having trouble to
understand how interval tree A is updated in ForwardUpdate
https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L194:
As far as I understand it, the idea is to move the location candidate one
step to the right (Line 198
https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L198)
and then update the trees by removing distances related to the point on the
left which is now outside the window and by adding distances related to the
point on the right which is now inside the window.

Since the window for tree A contains min_size points, I think we should
add and remove min_size - 1 distances from tree A, respectively. However,
as far as I understand lines 201-219
https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L201-L219,
the code actually does the following:

  • Add the min_distance - 1 distances from Z[tau1 - min_size] - Z[tau1
  • 1] through Z[tau1 - 2] - Ztau1 - 1. So far so good.
  • Remove the min_distance (!) distances from Z[tau1 - min_size] -
    Z[tau1 - min_size - 1] through Z[tau1 - 1] - Ztau1 - min_size - 1. I don't think that last distance should be removed here, since Z[tau1
  • 1] is the new point and Z[tau1 - min_size - 1] is the point that was
    just dropped.
  • Add the single distance Z[tau1 - min_size - 1] - Z[tau1 - min_size].
    I don't think that's correct: this is a distance related to the point we
    just dropped, so it should be removed. In addition, we just did remove it
    in the previous step!

I therefore think that the loop in line 208
https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L208
should have the limit i < tau1 - 1 (just as the loop before it) and that
the addition of the single distance after it (lines 215-219
https://github.com/twitter/BreakoutDetection/blob/master/src/edmTail.cpp#L215-L219)
should be removed.

Is this reasoning correct or did I miss something?


Reply to this email directly or view it on GitHub
#18.

  • Nicholas James

@torfsen
Copy link
Author

torfsen commented Oct 26, 2015

Thanks for your quick response, Nicholas! I still think that there's a bug, though, because the problematic distance in step 2 is not the same as the one in step 3:

The code removes Z[tau1 - 1] - Z[tau1 - min_size - 1] (step 2) but doesn't remove Z[tau1 - min_size - 1] - Z[tau1 - min_size] (or, more precisely, the distance is removed in step 2 and added back in step 3). Both of these are incorrect, as far as I understand: Z[tau1 - 1] - Z[tau1 - min_size - 1] hasn't been added before and should therefore not be removed, and Z[tau1 - min_size - 1] - Z[tau1 - min_size] refers to the dropped point Z[tau1 - min_size - 1] and therefore should be removed.

Please forgive my persistence, but the paper is a bit fuzzy on the details here so I'm trying to make sure I understand it correctly.

@utkrist
Copy link

utkrist commented Jul 9, 2016

This must be a bug. It looked fishy to me as well was happy to find a discussion already in the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants