Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systematic error in the indirect identification #4

Open
moorepants opened this issue Mar 20, 2015 · 10 comments
Open

Systematic error in the indirect identification #4

moorepants opened this issue Mar 20, 2015 · 10 comments

Comments

@moorepants
Copy link
Member

@tvdbogert I've created an explanation of the systematic error that I believe is present in the indirect identification. I've written it up as an IPython notebook and it also uses a simpler problem to demonstrate my complete method for the identification process. So this is my simplified code that shows the process I'm using and also demonstrates the bias issue and how it is really the exact same as the one in the direct identification problem.

http://nbviewer.ipython.org/github/csu-hmc/inverted-pendulum-sys-id-paper/blob/master/notebooks/systematic_error.ipynb

@moorepants
Copy link
Member Author

Hmm, I just ran the notebook with a = 0 and w being large. If our theory is that the ratio of a / w needs to be large for the identification to work, then this debunks it. The error in the gain id seems to decrease when a / w decreases...not when it increases.

@tvdbogert
Copy link
Member

Nice to have a simpler test case. The 1 DOF pendulum is what van der Kooij also used.

I am looking at the notebook and I see a with an amplitude of about 1 m/s^2 and w with an amplitude of about 20 deg. w introduce a torque noise of a about 1/5 rad x k = 20 Nm. This is much larger than the effect of a (which is equivalent to m_a_L = 1 Nm), so the system is dominated by the process noise. But still there should be no bias, just random error (if my original understanding is correct...)

I did not see the a=0 case, but did you find that the results were less biased in that case?

I had some questions/comments:

  • k is very high relative to gravity, about 10x more than the minimum of mgL that is needed to stabilize the system. So the system is almost a harmonic oscillator plus a random torque k*w(t). Theoretically the natural frequency should be sqrt(k/(mL^2)) =10 rad/s = 1.6 Hz which I see in the simulations.
  • The simulated movement gradually increases its amplitude. It reminds me of an undamped harmonic oscillator driven by a random excitation. But why does the amplitude stop increasing if there is no energy dissipation or derivative term in the controller? It looks like it stabilizes at about 90 degrees amplitude, which is where the a(t) effect becomes zero. Maybe I was wrong that the effect of a(t) is small relative to w(t)?
  • The indirect id gave you k=104. Do you consistently find k>100 if you repeat the experiment? A random error is what I would expect.
  • The direct collocation seems to have done a good job finding the best fit. The movement simulated with DC does not have an increasing amplitude in the simulated movement. This makes sense because the process noise is not there to excite the system. There is still some excitation from a(t) though. Why does that not increase the amplitude at least somewhat? Especially in the linear model where sin(theta)=theta and cos(theta)=1, so the effect of a(t) does not diminish as the pendulum goes horizontal. Is this the effect of backward Euler numerical damping?

Since the linear system already seems to demonstrate the problem, let's focus on that one. This is just a harmonic oscillator, right? If we set some constants to 1, this reduces to:

xdd = -k*x + w(t) + a(t)

And we do the system identification by fitting the observed movement of this system to a model without process noise, and the known a(t):

xdd = -k*x + a(t)

I feel like this can be solved analytically to prove whether or not the k estimate is biased.

I am going to look again at van der Kooij's paper for some ideas.

Maybe my questions/comments trigger some new thoughts on your side.

@tvdbogert
Copy link
Member

I believe I can prove analytically for this linear system that there is no bias when estimating k by least-squares. I will post the proof here as soon as I have a chance to write it down.

If that holds up, any remaining issue you see must be caused by numerics. I see two potential causes:

  1. Difference between DC (backward Euler or midpoint Euler) and the variable step integrator that produced the data. If that is the cause, the problem would diminish with more time points in the DC, and you said this was not the case. Are you sure of that?
  2. Interpolation of the w(t) and a(t) signals during the variable step integration. Since w(t) and a(t) are analytical functions of time (sum of sines), try to evaluate them exactly like that, instead of interpolating a sampled signal.

But I'm getting ahead of myself. You need to see the proof first. More later.

@moorepants
Copy link
Member Author

I did not see the a=0 case, but did you find that the results were less biased in that case?

The notebook is interactive, I've just been trying out numbers. I didn't post this.

k is very high relative to gravity, about 10x more than the minimum of mgL that is needed to stabilize the system. So the system is almost a harmonic oscillator plus a random torque k*w(t). Theoretically the natural frequency should be sqrt(k/(mL^2)) =10 rad/s = 1.6 Hz which I see in the simulations.

Yes, I just chose random numbers to start.

The indirect id gave you k=104. Do you consistently find k>100 if you repeat the experiment? A random error is what I would expect.

You can get numbers above and below the known gain by rerunning the notebook.

Why does that not increase the amplitude at least somewhat?

Not sure.

Is this the effect of backward Euler numerical damping?

I'm using midpoint integration in this problem.

@moorepants
Copy link
Member Author

Difference between DC (backward Euler or midpoint Euler) and the variable step integrator that produced the data. If that is the cause, the problem would diminish with more time points in the DC, and you said this was not the case. Are you sure of that?

With low process noise, more nodes equates to exact identification of the parameter. With high process noise, more nodes equates to improving the identification but you never get to the correct solution.

Interpolation of the w(t) and a(t) signals during the variable step integration. Since w(t) and a(t) are analytical functions of time (sum of sines), try to evaluate them exactly like that, instead of interpolating a sampled signal.

I was using the interpolation because I had white noise for the additive reference noise. If I switch that to a sum of sines, you're right the interpolation is not needed.

@moorepants
Copy link
Member Author

After playing around with the notebook last night I noticed that if you make a=0 and w=something, then the only way the identification model can have motion is to adjust the initial conditions. So the identification process essentially finds the best k and the best initial conditions such that the difference in the trajectories is minimized. If I were to constrain the initial conditions to the measured initial conditions, there wouldn't be a solution because the identification model has no external inputs when a=0. But when this identification solves, it interestingly turns out that the identified gain is still close to the known gain. I think this is simply because both systems have the same natural frequency. It just fits the best sine wave that it can to the data.

@moorepants
Copy link
Member Author

I believe I can prove analytically for this linear system that there is no bias when estimating k by least-squares. I will post the proof here as soon as I have a chance to write it down.

That's cool. But I still don't understand how this could possibly be true. If the id model doesn't have the noise term and you set a=0 and w=something when creating the data, it is impossible to identify the correct gain. But you seem to be suggesting that if I run that identification enough times the mean will be the known gain (i.e. the error I see is random). I can run that test to see. From running it a handful of times, it seems like it may be true.

@tvdbogert
Copy link
Member

I understand now why the simulation model does not blow up: the sum of sines does not contain the exact natural frequency of the system. With a random signal, it would blow up eventually. But the sum of sines should be fine for identification.

To test whether the error is random, run everything N times (with new random seeds for a and w), but the same system parameters. If the mean of the identified k's is within the standard error of the mean (SEM=SD/sqrt(N)), there is no bias, only random errors. If the bias is small, you will need a high N to detect it, but at some point you can say: "the bias was less than ... (the SEM value)".

Proof will come soon. If it goes as I expect, it will convince you.

@moorepants
Copy link
Member Author

I see why we get a random looking error for this example. When a=0 and w=something, the identification is to simply fit a sine wave to the data. The best sine wave will have a similar dominate frequency as the data does. So if you generate N different w's the mean k_id will be k. So I believe that without actually running it N times. But it isn't clear to me that this extends to more complicated models.

@tvdbogert
Copy link
Member

Apologies for using pencil. Scan attached, I hope Github posts it.

I was able to solve the identification problem analytically for a 1-D
linear system, and I am pretty confident that this will extend to higher
dimensional linear systems. Should not be too hard. Not sure about
nonlinear systems, but we can definitely use the linear system to work
on the numerical methods until there is no more systematic error,
because I now can state some (limit) conditions for which the bias
should go away. It is possible that the bias also disappears under more
loose conditions, I think this can be refined with some more work.

The easiest way to remove the bias is a >> w (which we already knew)
but we don't want to do that because w is not under our control and a
can't be too large for human subjects.

If you go through my analysis, you see that the bias is gone for any
combination of a and w, if all of the following are true:

  1. h (time step in DC) -> 0
  2. T (duration of data) -> infinity
  3. n (number of sines) -> infinity
  4. the identification model is simulated with correct initial conditions

I am pretty sure that condition 3 is required, so we need white noise
rather than sum of sines. The first condition may not be needed if it
introduces a random error. The second condition was used to eliminate
the integrals of products of two different sines. Even when T is not
infinity, these effects may still disappear. I can have another look at
this. The 4th condition may not be necessary, we can probably prove
that the terms due to initial conditions disappear from the gradient of
the cost function when the other conditions are true.

So I now see that this is not quite complete, but perhaps a good
starting point.

You would think that this would already be known, but I just needed to
do this for myself rather than read it in a book.

On 3/20/2015 1:07 PM, Jason K. Moore wrote:

I see why we get a random looking error for this example. When a=0
and w=something, the identification is to simply fit a sine wave to
the data. The best sine wave will have a similar dominate frequency as
the data does. So if you generate N different w's the mean k_id will
be k. So I believe that without actually running it N times. But it
isn't clear to me that this extends to more complicated models.


Reply to this email directly or view it on GitHub
#4 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants