-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in degenerate EM update for U #156
Comments
Hi, Great to hear that MARSS is getting translated to Python! I have had some inquiries on that but didn't have any bandwidth to work on it. I'll be keen to see how it works out. Regarding the question. So think of the stochasticity as as a network. Is a state at time t stochastic (via Q at time t) or connected to a stochastic state via B. In your case, with a B(2,1) non-zero value, state 2 is always stochastic because it is connected to state 1. Note if you had a stochastic x0 (at t=1 or t=0), you have to look if there was a connection to that stochastic x also. Keep in mind that the connection via B is not just t-1. You have to think of it as a network. In the derivation it talks about raising B to a power to find the network connections at time t+n. It can be rather slow to compute this "connection" matrix (B raised to a power) to find which states with Q=0 are actually stochastic via a connection to a stochastic state via B. That said it is possible there is a bug somewhere. So definitely keep me updated on what you find (with code to replicate) and I'll dig into it. |
Great, i'll make sure to pass it along if I get my Python version into a publishable state. So, the issue i'm concerned with is at t=1 in the model I described. Certainly at t=2 and beyond both states are non-deterministic. My understanding from reading the paper is that at:
What instead I think is happening in the library is that at this line the IId matrix is being updated to its t=2 form, but then at this subsequent line it's assuming it's still in t=1 form. Essentially it seems like an off by one, where IId is getting updated one time too many early on. I tried making a couple of changes that do it the way I was imagining it, which you can see here: Here's an example of something that doesn't work without the patch:
I'm just using a such a trivial dataset because I wanted something easy to generate identically in Python, so I could compare my EM updates to yours. The model isn't intended to mean anything or represent anything useful, but without the patch this fails to fit with a "denominv" error. I'm just learning all of this myself though so sorry if i'm just not understanding it correctly. |
Thanks for the example code and new explanation! I will dive into it next week. |
Goodness the degenerate section of EMDerivation.pdf is exceedingly confusing and has quite a few typos. I am still aways away from determining if what you write is a bug or not. However, note that the issue of u not being estimable is not related to this "bug" (if it is indeed a bug). It has to do with the u rows associated with indirectly stochastic x not being estimable. See section 7.2.2 in EMDerivation.pdf and the last paragraph in that section. It is a constraint arising from the partial differential of the expected log-likelihood not being possible. I will ponder more how to better explain why the partial differentiation wrt u^is is not possible. But read the end of paragraph 2 in section 7.2.2 and maybe you'll see why. If x_t = B x_t-1 + u for the indirectly stochastic x, then it is impossible to hold x_t and x_t-1 constant while varying u, which is what has to be done for partial differentiation in the M step of the EM algorithm. For the stochastic x, x_t = B x_t-1 + u + w_t. So it is possible since w_t is an error term. |
Ah yes you're totally right about that. The constraint section for the degenerate variance condition is the section I probably struggled to parse the most. I found most of the paper to be really well and clearly written though, really appreciated e.g. the primer on matrix calculus and similar things peppered throughout. The level of detail/clarity was unusually good for things like this at least in my experience. |
So, yes it's a bug. The line you identified is correct for t>(tinitx+1) (which is going to be t=2 or t=3 depending on tinitx) but not correct for tinitx+1 (so t=1 for tinitx=0 and t=2 for tinitx=1). The M matrix (the matrix that shows where connectivity) needs to be different for tinitx+1 since x at tinitx could be deterministic (as it is in the model you specified). I'll have to think a bit how to fix this so it works correctly for all V0 matrices. |
I have implemented a correction to this bug on the 'degenfix' branch. Thanks for alerting me to this! With the fix, your code runs. It shouldn't run. I should block this model in MARSSkemcheck as part of the degenerate model constraints. But I am not going to do that right away. I want to revisit my logic for why estimation of u associated with indirectly stochastic x (so in this case u2) should be impossible. Your model with
should definitely be blocked by MARSSkemcheck. But I am not sure about
I am also updating the EMDerivation section, though it will still be very very dense. The derivation is tedious and I have yet to figure out a clear way to write it out. |
Great, thanks for the quick turnaround. Glad I didn't end up wasting your time with this. I should have mentioned it before, but I think the same issue might be present in the EM update for x0 too. Re: the degenerate variance section, one little thing that I still don't entirely understand is the notation in equation 204 for delta 4, at t=0. It's expressed as:
This is actually what led me to check out the code for the library in the first place. I see that in the library it just initializes it to Du, which makes sense, but I still don't quite comprehend what the 0_mxm part of the notation is trying to express, and the same goes for delta 3. Overall though the clarity of the derivation is exceptional, i've implemented almost all of it directly from the paper over the last month or so. In most cases where I struggled for a while it ended up being because I hadn't read the prior sections as thoroughly as I should have and going back and re-reading cleared things up. It's complicated in some places, but the underlying subject matter is complicated, so there's only so much you can do to make it simple. |
Ah, yes same issue is in the x0 update, but I fear there is a problem somewhere. Adding a NA causes the LL to drop.
The estimates are fine. Compare:
Back to the drawing board. Something subtle is wrong. |
What's the model being used here? |
Same one as you posted.
I think this is related to the issue of estimated u associated with indirectly stochastic x rows, so in this case u2. I could just block via MARSSkemcheck, but curiously enough the one deterministic step from t=0 to t=1 should allow estimation. Not much information but should work. So there is something subtle amiss in the derivation. I'll dive into again later today but it will probably take a few days to sort out. |
I might have noticed another small inconsistency in the calculation of x0: https://github.com/atsa-es/MARSS/blob/degenfix/R/MARSSkem.r#L491 Compared to equation 209 in the derivation, the numerator seems to be missing a bold-L/star$V0 factor in its calculation. |
I've been implementing some of the model described in [your paper[(https://cran.r-project.org/web/packages/MARSS/vignettes/EMDerivation.pdf) in Python, and to some extent using this library as a reference implementation. However, I recently ran into something that I can't seem to reconcile, and which either I just don't understand correctly from the paper, or is not quite right in the library's implementation.
Let's take a simple example, suppose we have an AR(2) model, like so:
My understanding from the paper is that the identifier matrix for determinism rows at t=0 should be the identity, and the library matches that. But at t=1 in this model, the second component of the state is still deterministic. In the library as implemented now it appears to treat the entire state vector as non-deterministic, which leads to an uninvertible denominator.
Sorry if i'm just misunderstanding something simple, as that's entirely possible.
The text was updated successfully, but these errors were encountered: