Skip to content
This repository has been archived by the owner on Nov 13, 2021. It is now read-only.

'Interesting' behaviour in EDM-X when data has a small SD #22

Open
richdrich opened this issue May 2, 2016 · 1 comment
Open

'Interesting' behaviour in EDM-X when data has a small SD #22

richdrich opened this issue May 2, 2016 · 1 comment

Comments

@richdrich
Copy link

richdrich commented May 2, 2016

I found some unusual behavior as the standard deviation of some test data (on either side of a step change) drops.

When the sd is less than 1, the detection of the change becomes inaccurate - in a very defined manner. [EDIT: I'd note that the '1' is a big coincidence - the knee changes as the data range changes, as you might expect]

See the below. My data actually changes at point 500, EDM-X finds this to within two intervals above that and is out by 50 intervals below.

I'd be interested in any comments on this...

library(BreakoutDetection)

# Try EDM-X on SDs over a (log) range
logSds <- seq(from=-0.2, to=0.2, by=.05)
sds <- 10 ^ logSds
errs <- vector(,length(sds))

erri <- 1
for(i in logSds) {
  sd <- 10 ^ i

  set.seed(123)
  # construct datasets
  s1 <- zoo(rnorm(500,mean=100,sd=sd), seq.POSIXt(as.POSIXlt("2016-01-01"), by=3600,length.out=500))
  s2 <- zoo(rnorm(400,mean=110,sd=sd), seq.POSIXt(as.POSIXlt("2016-01-21 20:00:00"), by=3600,length.out=400))

  st <- rbind(s1, s2)

  zdata <- data.frame(timestamp=time(st), count=as.vector(st))

  br <- breakout(zdata,min.size=100, method='amoc', plot=T)

  errs[erri] <- abs(br$loc - 500)
  erri <- erri + 1
}

plot(sds, errs)
@richdrich
Copy link
Author

Further to this, I think the issue is when the SD and the number of observations are such that there the two medians tend to 1 and 0 for all data ranges (values of tau2 and tau1) => that leads to the medians being ignored and the algorithm converging at a point determined by tau2 and tau1 (and hence the sizes of the two datasets), which isn't the actual breakout. Or something like that.

I'm thinking this won't be too much of a problem with real data (I found it with a naive test case) but would be interested in any comments?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant