-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#7157 dyn inject: Improve ord tracking with synthetic records #7299
base: master
Are you sure you want to change the base?
Conversation
Fixes handling of some cases related to ordinal tracking when there are queued synthetic records. get_input_record_ordinal, which has existing logic to discount records that are read-ahead by the scheduler, was missing logic to correctly account for synthetic records in the queue. This caused a higher count to be subtracted from the record ordinal returned by the input stream, particularly when a new input is switched to (in which case the guarding condition for the read-ahead adjustment was satisfied). Added tracking for whether each queued record is real, that is, read from the input reader or synthetic. Fixes an issue with instrs_pre_read tracking which should be decremented only when a real record is passed on to the caller. Adds a test that verifies the bookkeeping for use_input_ordinals case, which was affected by the above issue. Issue: #7157
Adds the count of instances where the context switch sequence was injected to the schedule_stats tool output. Modifies the switch_insertion test to use schedule_stats instead of basic_counts. This helps reveal a bug in context switch trace template insertion that causes the injection to happen many more times than the context switch events. This diff is split out from #7299 which fixes the above bug. Issue: #7157
Fixes a bug in the dynamic injection logic in the drmemtrace scheduler that caused the context switch sequence to be injected more times than actual context switches. It was erroneously injected also when set_cur_input() was invoked without any change to the input. update_switch_stats() is a better control point for switch sequence injection, which is renamed to on_context_switch() here. Now, in the switch_insertion offline test, we have 11 context switch injections (== input-to-input + idle-to-input switches), instead of the prior 363. We deliberately do not inject the context switch sequence on input-to-idle transitions. Note that we already inject on idle-to-input transitions, and it is not clear yet whether we want to repeat it at input-to-idle also or maybe inject some different sequence. The above move also removes the use of get_instruction_ordinal() to determine whether switch records should be inserted. Existing logic checked for get_instruction_ordinal() > 0 to avoid inserting switch records before the very first input on an output. This does not work as required when USE_INPUT_ORDINALS is set, as then the input-local instruction ordinal instead (which is non-zero as it is not adjusted for the read-ahead instructions) is returned, which ends up inserting the switch sequence before the very first input also. A scheduler unit_test that uses USE_INPUT_ORDINALS is coming up in #7299 with more fixes. Issue: #7157
bool cur_from_queue; | ||
bool is_cur_record_real; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an initial value. Ditto for line 210 (I think it's set in the code before read so no current bug I don't think, but safest to have an initial value).
input.queue.push_front(pid); | ||
input.queue.push_front(tid); | ||
input.queue.emplace_front(pid, IS_SYNTHETIC); | ||
input.queue.emplace_front(tid, IS_SYNTHETIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so this doesn't need to update real_records_in_queue
b/c these are synthetic.
Maybe a helper is needed so adding goes through central places so no addition overlooks updating real_records_in_queue
? I'm afraid someone will see this code and add a real record to the queue here and not update the counter b/c the existing code doesn't update it. Or just adding a comment here saying we don't need to increment is enough?
@@ -1821,10 +1832,14 @@ scheduler_impl_tmpl_t<RecordType, ReaderType>::get_input_record_ordinal( | |||
return 0; | |||
uint64_t ord = inputs_[index].reader->get_record_ordinal(); | |||
if (get_instr_ordinal(inputs_[index]) == 0) { | |||
uint64_t adjust = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this should go after the comment on lines 1839-1841.
@@ -1932,8 +1950,9 @@ scheduler_impl_tmpl_t<RecordType, ReaderType>::advance_region_of_interest( | |||
if (input.cur_region > 0) { | |||
VPRINT(this, 3, "skip_instructions input=%d: inserting separator marker\n", | |||
input.index); | |||
input.queue.push_back(record); | |||
input.queue.emplace_back(record, is_record_real); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to update the count if is_record_real? That helper is looking better now.
} | ||
} | ||
VPRINT(this, 5, | ||
"next_record[%d]: candidate record from %d (@%" PRId64 "): ", output, | ||
input->index, get_instr_ordinal(*input)); | ||
if (input->instrs_pre_read > 0 && record_type_is_instr(record)) | ||
// FIXME: This is likely too premature; we should either move it to later, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue #?
@@ -2727,6 +2763,9 @@ scheduler_impl_tmpl_t<RecordType, ReaderType>::update_next_record(output_ordinal | |||
VPRINT(this, 2, "output %d base timestamp = %zu\n", output, | |||
outputs_[output].base_timestamp); | |||
} | |||
// FIXME: When USE_INPUT_ORDINALS is enabled, this returns the input-local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head USE_INPUT_ORDINALS is never set for a dynamic schedule: but I actually don't see the docs or code disallowing that. I think we should disallow it. Please list that as the preferred solution; maybe that downgrades this to XXX as no known use case would get here.
@@ -5770,9 +5770,11 @@ test_unscheduled() | |||
} | |||
|
|||
static void | |||
test_kernel_switch_sequences() | |||
test_kernel_switch_sequences(bool use_input_ordinals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment: I don't think we should bother trying to support USE_INPUT_ORDINALS with a dynamic schedule: no known use case wants that. I think we should just disallow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic schedule and USE_INPUT_ORDINALS seems orthogonal to each other. USE_INPUT_ORDINALS is supposed to affect only what the stream APIs return, not how the scheduler behaves. I don't see why we should disallow that combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no known use case wants that
Even if that is the case, the restriction is not intuitive. Are there other concerns in handling this combination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combination is adding complexity and causing extra work with no benefit: better to scope-limit than complicate the code for no benefit. USE_INPUT_ORDINALS was added solely for analyzer use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the restriction is not intuitive.
I disagree: USE_INPUT_ORDINALS is all about ignoring the output streams and only looking at the inputs; which implies you don't care where the inputs are located -- you don't care about the schedule or whether they are interleaved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
I guess I need to first find some other real case that would hit this need of tracking real/synthetic record count in the queue. I can think of some possible cases but they're harder to write a test for to confirm:
- missing correction for record/instr ordinal when some real record was put back in the queue
- premature decrement of instr_pre_read causing get_instr_record to return a smaller value
I'll go ahead with #7277 now (which is where I started looking into this issue), because it doesn't look like it is affected: syscall sequences are never injected without seeing an instr on an input, which means the record/instr ordinal over-adjustment gating logic (get_instr_ordinal() == 0) would never be invoked when syscall injected instrs are in the queue. They may be some tricky interaction with instr_pre_read (since syscall injected instrs would also cause it to be decremented) but maybe that's not an issue either because the instr after the syscall would not even be read from the input reader until all syscall injected instrs are done.
Fixes handling of some cases related to ordinal tracking when there are queued synthetic records.
get_input_record_ordinal(), which has existing logic to discount records that are read-ahead by the scheduler, was missing logic to correctly account for synthetic records in the queue. This caused a higher count to be subtracted from the
record ordinal returned by the input stream, particularly when the queued records include the (possibly many) injected switch records and a new input is switched to (in which case the guarding condition for the read-ahead adjustment was satisfied).
Adds tracking for whether each queued record is real, that is, read from the input reader (it could alternatively be synthetic). Also adds an assert to ensure we don't underflow in get_input_record_ordinal() which is triggered without this fix.
Fixes an issue with instrs_pre_read tracking which should be decremented only when a real record is passed on to the caller. There is likely an additional issue with premature decrement of instrs_pre_read; added a FIXME for now as it is beyond the scope of this PR.
Adds a scheduler unit_test that verifies the bookkeeping for the USE_INPUT_ORDINALS case, which was affected by the above issues.
The added scheduler unit_test also tests the on_context_switch() change made by #7302 (which removed the use of get_instruction_ordinal() to detect whether to insert the context switch sequence) for the USE_INPUT_ORDINALS case. Adds a FIXME by a use of get_instruction_ordinal() that should be fixed as the intended value there is the global one, which may be more complicated.
Issue: #7157