-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record job start time obtained via clock_gettime to fix time pain points #1617
Conversation
I'll readily admit I wasn't 100% confident on the client/server portions of the changes - I just imitated what I saw other things doing as best I could. Please just let me know if anything looks off and I'll spend some more time to better understand it and fix it. I also did some option renaming (preserving the old names by using aliases) and chose to use the same clock_id for both the new recorded job start time and the previous I think using the same clock_id makes sense - I don't see a need to record the start time against one clock_id but then also choose to use a different clock_id in log files. The fact that Happy to take any feedback on option/variable names as well. Thanks for your time. |
Add a new key in the json per-job output, clock_gettime_job_start, that records the job start time obtained via a call to clock_gettime using the clock_id specified by the clock_id option. This allows times of fio jobs and log entries to be compared/ordered against each other and against other system events recorded against the same clock_id. Rename the alternate_epoch_clock_id option to clock_id, as now this clock_id serves two purposes. The primary purpose is to be the clock_id for recording clock_gettime_job_start. The secondary purpose is to be the clock_id used if log_alternate_epoch is specified, in which case each log file timestamp is based on the epoch specified by clock_id. (Each such timestamp is obtained by taking the traditional zero-based timestamps and adding clock_gettime_job_start to them.) We also make log_unix_epoch an official alias of log_alternate_epoch, instead of maintaining both redundant options. Fixes axboe#1544 Signed-off-by: Nick Neumann [email protected]
9b0e060
to
99dcf32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was accidentally included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will remove
effect. Default value is 0, or CLOCK_REALTIME. | ||
Backwards compatible alias for clock_id. | ||
|
||
.. option:: clock_id=int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name clock_id
is not specific enough and risks being confused with the option clocksource
. How about something like log_clock_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. It used to be log_alternate_epoch_clock_id
. The only problem with log_clock_id
is that it now affects the job start time recorded in the json, which is independent of the log_alternate_epoch
setting which controls if the log files use times based on this (needs better name) clock_id
.
How about alternate_epoch_clock_id
, and then I rename the clock_gettime_job_start
variable and json entry to alternate_epoch_job_start
? I think that may work pretty well if it sounds ok to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to just create a brand new option and struct thread_data
member for job start time? It's just an accident that job start time is related to the issues with the clock used for logging. I bring this up because it seems difficult to find an option name that covers both purposes. How about just calling the new JSON key something like job_start
and then adding a new job_start_clock_id
option? The description for this option should be in the "Time related parameters" section instead of in the "Measurements and reporting" section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do that (completely separate option and struct thread_data
member). I started out planning to do that but then ended up the way I submitted because, well, the options are more related than they seem from the documentation side. (You may already be aware of this, but just in case.) Here's the background.
Before I started using fio, there was a log_unix_epoch
option. The docs say it logs Unix timestamps, and this is kindof true. What it actually does is remember the Unix timestamp when the job started (obtained via clock_gettime
) and then add the elapsed job time (obtained based on the configured clocksource
) to obtain a Unix-format timestamp for each entry. This is actually different than what you'd get by calling clock_gettime
to get the Unix timestamp for each I/O operation, since Unix time is not monotonic and is affected by NTP.
Then I added log_alternate_epoch
, which is just like log_unix_epoch
but lets you use any clock_id
instead of CLOCK_REALTIME
(used for log_unix_epoch
). (Interesting aside - in the case of using CLOCK_MONOTONIC_RAW
, it turns out that the job start time plus the elapsed job time can actually be the same as what you'd get calling clock_gettime
for each I/O,)
Anyways, that's why job start time ends up so "tied" to the options for log file timestamps. But there is no reason the log file timestamps and the recorded job start time have to share the same clock_id.
If I had a time machine, I'd stop any of the log
options from being created, and the log files would always record time elapsed since job started, and a single job_start_time
would be the only mechanism for recording time using another clock_id
, since that is what is really happening in the implementation. The "alternate" timestamps in the log files are really an illusion, if that makes sense.
But I know that can't happen. Maybe all of this is really just another good argument for what you suggested, as it keeps the (imho) log timestamps and recording job start time completely separate in terms of implementation, so either one could be revamped in the future without having to worry about the other.
Just let me know what you think is best (just make it two separate things or try to come up with an option name for both purposes); I'm happy to do it either way.
Thanks for listening to my rambling thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the recap. I think it would be cleaner to just have an entirely separate start time feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll do that then. Thanks for the feedback!
Backwards compatible alias for clock_id. | ||
|
||
.. option:: clock_id=int | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a redundant mention of clock_gettime in the first sentence below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks, I'll fix that.
|
||
s = get_sample(iolog, cur_log, cur_log->nr_samples); | ||
|
||
s->data = data; | ||
s->time = t + (iolog->td ? iolog->td->alternate_epoch : 0); | ||
add_alternate_epoch = (iolog->td != NULL) && iolog->td->o.log_alternate_epoch; | ||
s->time = t + (add_alternate_epoch ? iolog->td->clock_gettime_job_start : 0); | ||
io_sample_set_ddir(iolog, s, ddir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following would be easier for a human to read:
s->time = t;
if (iolog->td && iolog->td->o.log_alternate_epoch)
s->time += iolog->td->clock_gettime_job_start;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, agreed. I had pulled the boolean out when the test had gotten more complicated but it ended up pretty short. I'll update.
I don't see any significant problems here. You can manually test the client/server code changes by checking to see if things work as expected in client/server mode. What happens to |
I believe it will actually just record the first start time in the group. It's definitely not ideal. There's currently code in there that looks to do the same thing for I agree it would be a nice enhancement to clean this up. A list of each set of values for each group entry probably makes sense, but I'm guessing the old values would need to be kept at the top-level too for backwards-compatibility purposes. I could just not output the job start time if group reporting is on, if you think that is better. |
Thanks, I'll do this before I post an updated PR with the suggested corrections/changes. |
When I did some testing yesterday I thought I saw the job start time disappear with group reporting on but I just tried it again and you are right, the key is still there. It would be nice to do something sane when group reporting is enabled but for now a warning about this situation in the documentation should be enough since this is an issue that fio doesn't handle well in general. |
Sounds good. I'll add a doc warning with the next PR. |
There were enough changes/differences that I went ahead and just started new with dev and a new PR: #1621 |
Add a new key in the json per-job output, clock_gettime_job_start, that records the job start time obtained via a call to clock_gettime using the clock_id specified by the clock_id option. This allows times of fio jobs and log entries to be compared/ordered against each other and against other system events recorded against the same clock_id.
Rename the alternate_epoch_clock_id option to clock_id, as now this clock_id serves two purposes. The primary purpose is to be the clock_id for recording clock_gettime_job_start. The secondary purpose is to be the clock_id used if log_alternate_epoch is specified, in which case each log file timestamp is based on the epoch specified by clock_id. (Each such timestamp is obtained by taking the traditional zero-based timestamps and adding clock_gettime_job_start to them.)
We also make log_unix_epoch an official alias of log_alternate_epoch, instead of maintaining both redundant options.
Fixes #1544
Signed-off-by: Nick Neumann [email protected]