Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with Clock Wander #2

Open
ddrown opened this issue Oct 1, 2016 · 12 comments
Open

Dealing with Clock Wander #2

ddrown opened this issue Oct 1, 2016 · 12 comments

Comments

@ddrown
Copy link

ddrown commented Oct 1, 2016

Copied from Youtube comment:
I took toprecorder/data10.txt data and looked specifically at offset and frequency differences between all the clocks:

Using .241's broadcasts as the "master" clock:

  • .179 is running 0.316ppm slower
  • .147 is running 2.218ppm slower
  • .169 is running 13.519ppm slower and jumped by 508.763 microseconds somewhere between .241's pktids 9254 and 9255
  • .213 is running 7.877ppm slower and jumped by 508.513 microseconds somewhere between .241's pktids 10715 and 10716

Removing the average frequency differences and the two clock jumps, I get this graph, which shows the clock wander: https://dan.drown.org/clocks/data10.png

Maybe trying to insulate the esp8266's from any airflow would lower their temperature changes, which should lower their clock wander.

Also, maybe using a PID control loop on each node would work to sync the frequencies. This is what I've done with NTP on the esp8266 along those lines: https://github.com/ddrown/Arduino_ClockPID

NTP uses round trip time to try to eliminate the phase offset due to one way latency. I'm not sure that would be needed for this application. Knowing the distances between the fixed points should make it possible to cancel out those terms in the equation.

Lastly, the rx and tx timestamp accuracy will add errors as well, but I haven't measured how accurate they are.

@cnlohr
Copy link
Owner

cnlohr commented Oct 1, 2016

I tried simply finding the slew rate over time, then, slowly adjusting it over time to back that back out, but I don't know if I'm doing it right. Do you think you could increase your algorithm to be more general to try to determine how all that goes?

Additionally, can you try verifying the send time using your algorithm, or rather, make sure that send time does not have a great deal of jitter within it? It looks like everything else you have here is gerat for results.

@ddrown
Copy link
Author

ddrown commented Oct 5, 2016

Ok, I have some updated data here: https://dan.drown.org/clocks/

The data and tools I used are here: https://github.com/ddrown/esp8266rawpackets-proc

Instead of a full PID controller, I'm just calculating rate differences and applying those. The remaining offsets are from one of: receiver jitter, transmitter jitter, or fast clock frequency changes. I'm feeding 32 samples at a time, which works out to about a second and a half worth of data at 22 packets per second.

The end result was: 50% of the time, all clocks were within +14ns -7ns (ignoring phase differences due to propagation delay). 98% of the time, all clocks were within +208ns -135ns.

+/-10ns is about +/-10ft so that might be the accuracy limit.

@cnlohr
Copy link
Owner

cnlohr commented Oct 5, 2016

Does the data seem centered around the expected locations of the target ESPs (and differential receive times, i.e. diagonal nodes are (10' difference)? Coincidence? Additionally, can you zoom in on your last two graphs? The data looks /really/ good! It looks like given enough data it should center around the expected locations.

@cnlohr
Copy link
Owner

cnlohr commented Oct 5, 2016

I just can't get over how good those last few graphs look, and really hope to be able to zoom in on them!

@ddrown
Copy link
Author

ddrown commented Oct 6, 2016

Ok, I added a second series of graphs showing the 250ns..-250ns range. I also added a histogram series. - https://dan.drown.org/clocks/

Clock sync has two pieces: phase and frequency. This is just the frequency part, the phase differences aren't handled yet.

@cnlohr
Copy link
Owner

cnlohr commented Oct 6, 2016

EDITED

Hmm... Your results are much, much better than mine. I don't know how you got everything to match the skew so well. Considering light travels at ~1ft/ns (why I use feet for this sort of stuff) Those results look /really/ good. What do you suppose causes the groupings of several like-packets periodically? In all of my analysis, I was seeing random meyandering and many, many outliers. You still have outliers, but, you also seem to have bunches of groups of data within the 99th percentile and outside the 25th percentile. Any idea what to attribute that bunching to?

I really can't wait to see what would happen when you do start to correlated this, i.e. use each as a master, and start to correlate the time differences. Actually... That would give you a better time-density, so there would be less drift/shifts between time syncs. Right now it's 30-50ms between packets being sent, if you use all the node tx's, it could go down to ~10ms between syncs to arbitrary nodes. I wonder if that would be much better?

Charles

@ddrown
Copy link
Author

ddrown commented Oct 7, 2016

The grouping/bunching is probably an artifact of how I'm doing clock sync. I'm not limiting changes from one group of 32 to the next, so a high/low average can throw the whole group off.

The next thing I want to do is apply this clock sync to the data from the other transmitters and see if those offsets are the expected values. The change in distance should show as a straight translation up or down on these graphs (but remain as a straight horizontal line).

@cnlohr
Copy link
Owner

cnlohr commented Oct 7, 2016

That would be awesome. Any way you can "window" the groups, i.e. every one calculates for the next 32. If you get outlier syncs, throw them out? But yes! Keep going!

@NeuralSpaz
Copy link

you guys rock.
Like your stuff. Though I would leave this here. No time to code it up atm but these analysis technique would be applicable if not just interesting reads. Might be even better if applied with some clustering and or Kalman filter to the estimated position/clock drift.

https://www.cs.umd.edu/class/spring2010/cmsc818g/slides/2010-03-25-TimeBasedLocation.pdf
http://kilyos.ee.bilkent.edu.tr/~gezici/papers/2013_TCOM.pdf

@ddrown
Copy link
Author

ddrown commented Oct 9, 2016

Ok, here's another set of graphs: https://dan.drown.org/clocks/index2.html

I used the time and frequency data from the first set, which is using .241 as the phase and frequency reference. I applied those corrections to each module's local clock and calculated the offsets of the other transmitters.

An interesting pattern shows up in this data: .241 is around 38 microseconds higher (2 times higher) than the other modules. I believe this is due to tx and rx delays.

The local timestamps on each module are relative to:
.241 = 0
.179 = .241 + 25ns + txdelay + rxdelay
.147 = .241 + 25ns + txdelay + rxdelay
.213 = .241 + 26.925ns + txdelay + rxdelay
.169 = .241 + 35.355ns + txdelay + rxdelay

The rebroadcast timestamps (these graphs) can be calculated as:
tx_timeref + rf_delay + txdelay + rxdelay - rx_timeref

So, for the .179 transmitter this looks like:
.179->.241 = (.241 + 25ns + txdelay + rxdelay) + 25ns + txdelay + rxdelay - 0
.179->.169 = (.241 + 25ns + txdelay + rxdelay) + 25ns + txdelay + rxdelay - (.241 + 35.355ns + txdelay + rxdelay)
.179->.147 = (.241 + 25ns + txdelay + rxdelay) + 35.355ns + txdelay + rxdelay - (.241 + 25ns + txdelay + rxdelay)
.179->.213 = (.241 + 25ns + txdelay + rxdelay) + 37.165ns + txdelay + rxdelay - (.241 + 26.925ns + txdelay + rxdelay)

This leads to .179->.241 having 2 * (txdelay + rxdelay) while the other paths cancel out one set of txdelay+rxdelay (on average as txdelay + rxdelay isn't a static number).

So I believe txdelay + rxdelay ~= 38 microseconds

@cnlohr
Copy link
Owner

cnlohr commented Oct 9, 2016

I can believe that's about the right number. My fear is that tx can't be trusted AT ALL. It sounds like you've confirmed those fears.

@cnlohr
Copy link
Owner

cnlohr commented Oct 9, 2016

I am bookmarking it and will read it more tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants