Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critial Time recordings might be significatly wrong due to clock synchronization issues #79

Open
tmasternak opened this issue Mar 8, 2018 · 6 comments

Comments

@tmasternak
Copy link
Member

Overview

The CriticalTime for a message is calculated in the processing endpoint partially based on the time header set by the sending endpoint.

If system clocks on the sender and processing endpoint differ significatly the recordings for CT might be considerably off. E..g it's possible for CT value to me smaller than Processing Time or even negative.

Possible solution

Detect when time value in the header for recieved message is smaller than the curren system time on the processing endpoint and log WARN.

/cc: @ramonsmits

@Scooletz
Copy link
Contributor

Scooletz commented Mar 8, 2018

Would a small effort approach like the following:

criticalTime = max(criticalTime,processingTime)

would be sufficient enough @Particular/metrics-maintainers ?

@dvdstelt
Copy link
Member

dvdstelt commented Mar 8, 2018

Definitely

@dvdstelt
Copy link
Member

dvdstelt commented Mar 8, 2018

Issue though is that if clock offset is large enough, not a single metric will have a criticalTime that is higher than processingTime.
Should we log a message (maybe every x-th measurement) that this is occurring? Or at the very least document it. Because people will start asking questions about this.

@ramonsmits
Copy link
Member

@dvdstelt That is what I suggested on Slack, that if that happens that we log an event. So I'm all for that to not hide it and give guidance on how the customer can resolve it. I also raised https://github.com/Particular/PlatformDevelopment/issues/1839 to make system clock difference appear as errors in ServicePulse.

@mikeminutillo
Copy link
Member

I definitely prefer https://github.com/Particular/PlatformDevelopment/issues/1839 to putting a floor on Critical Time. If we adjust Critical Time that way we are potentially hiding a problem from the user. That said, if the clocks drift the other way we don't have an easy way to detect it. Is this something we should be solving?

@ramonsmits
Copy link
Member

Is this something we should be solving?

@mikeminutillo Maybe not solve, but we should alert/indicate this state (time sent>current time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants