-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syscall / context switch performance improvements #1949
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -820,47 +820,55 @@ pub fn check_task_id_against_table( | |
/// Selects a new task to run after `previous`. Tries to be fair, kind of. | ||
/// | ||
/// If no tasks are runnable, the kernel panics. | ||
pub fn select(previous: usize, tasks: &[Task]) -> usize { | ||
priority_scan(previous, tasks, |t| t.is_runnable()) | ||
.expect("no tasks runnable") | ||
pub fn select(previous: usize, tasks: &[Task]) -> &Task { | ||
match priority_scan(previous, tasks, |t| t.is_runnable()) { | ||
Some((_index, task)) => task, | ||
None => panic!(), | ||
} | ||
} | ||
|
||
/// Scans the task table to find a prioritized candidate. | ||
/// | ||
/// Scans `tasks` for the next task, after `previous`, that satisfies `pred`. If | ||
/// more than one task satisfies `pred`, returns the most important one. If | ||
/// multiple tasks with the same priority satisfy `pred`, prefers the first one | ||
/// in order after `previous`, mod `tasks.len()`. | ||
/// in order after `previous`, mod `tasks.len()`. Finally, if no tasks satisfy | ||
/// `pred`, returns `None` | ||
/// | ||
/// Whew. | ||
/// | ||
/// This is generally the right way to search a task table, and is used to | ||
/// implement (among other bits) the scheduler. | ||
/// | ||
/// # Panics | ||
/// | ||
/// If `previous` is not a valid index in `tasks`. | ||
/// On success, the return value is the task's index in the task table, and a | ||
/// direct reference to the task. | ||
pub fn priority_scan( | ||
previous: usize, | ||
tasks: &[Task], | ||
pred: impl Fn(&Task) -> bool, | ||
) -> Option<usize> { | ||
uassert!(previous < tasks.len()); | ||
let search_order = (previous + 1..tasks.len()).chain(0..previous + 1); | ||
let mut choice = None; | ||
for i in search_order { | ||
if !pred(&tasks[i]) { | ||
) -> Option<(usize, &Task)> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thought: It might be useful to make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From a scan of the use sites, it looks like most want a scheduling hint ( The intent here would be to eliminate possible bounds checks in the consumption of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was indeed thinking of eliminating bounds checks. I'll see about checking this option out (at a lower priority), and if it proves to be an easy change then I'll check the asm output to see if there is any potential benefit there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This probably also makes no difference, or is negative. The |
||
let mut pos = previous; | ||
let mut choice: Option<(usize, &Task)> = None; | ||
for _step_no in 0..tasks.len() { | ||
pos = pos.wrapping_add(1); | ||
if pos >= tasks.len() { | ||
pos = 0; | ||
} | ||
let t = &tasks[pos]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similarly, there's probably a few instructions and a panic site that could be shaved off here by using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked with Godbolt and the panics are indeed optimized out by the compiler (with the correct target in use, thumv6m-none-eabi, and opt-level=2/s/z). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this (and basically all other cases where an access is dominated by a naive There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, in that case, let's leave it as is! Good job, LLVM! |
||
if !pred(t) { | ||
continue; | ||
} | ||
|
||
if let Some((_, prio)) = choice { | ||
if !tasks[i].priority.is_more_important_than(prio) { | ||
if let Some((_, best_task)) = choice { | ||
if !t.priority.is_more_important_than(best_task.priority) { | ||
continue; | ||
} | ||
} | ||
|
||
choice = Some((i, tasks[i].priority)); | ||
choice = Some((pos, t)); | ||
} | ||
|
||
choice.map(|(idx, _)| idx) | ||
choice | ||
} | ||
|
||
/// Puts a task into a forced fault condition. | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -9,6 +9,8 @@ | |||||||||||||||||
#![cfg_attr(not(test), no_std)] | ||||||||||||||||||
#![forbid(clippy::wildcard_imports)] | ||||||||||||||||||
|
||||||||||||||||||
use core::cmp::Ordering; | ||||||||||||||||||
|
||||||||||||||||||
/// Describes types that act as "slices" (in the very abstract sense) referenced | ||||||||||||||||||
/// by tasks in syscalls. | ||||||||||||||||||
/// | ||||||||||||||||||
|
@@ -84,6 +86,23 @@ pub trait MemoryRegion { | |||||||||||||||||
fn end_addr(&self) -> usize; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
/// Compares a memory region to an address for use in binary-searching a region | ||||||||||||||||||
/// table. | ||||||||||||||||||
/// | ||||||||||||||||||
/// This will return `Equal` if the address falls within the region, `Greater` | ||||||||||||||||||
/// if the address is lower, `Less` if the address is higher. i.e. it returns | ||||||||||||||||||
/// the status of the region relative to the address, not vice versa. | ||||||||||||||||||
#[inline(always)] | ||||||||||||||||||
fn region_compare(region: &impl MemoryRegion, addr: usize) -> Ordering { | ||||||||||||||||||
if addr < region.base_addr() { | ||||||||||||||||||
Ordering::Greater | ||||||||||||||||||
} else if addr >= region.end_addr() { | ||||||||||||||||||
Ordering::Less | ||||||||||||||||||
} else { | ||||||||||||||||||
Ordering::Equal | ||||||||||||||||||
} | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
impl<T: MemoryRegion> MemoryRegion for &T { | ||||||||||||||||||
#[inline(always)] | ||||||||||||||||||
fn contains(&self, addr: usize) -> bool { | ||||||||||||||||||
|
@@ -159,35 +178,53 @@ where | |||||||||||||||||
|
||||||||||||||||||
// Per the function's preconditions, the region table is sorted in ascending | ||||||||||||||||||
// order of base address, and the regions within it do not overlap. This | ||||||||||||||||||
// lets us use a one-pass algorithm. | ||||||||||||||||||
// lets us use a binary search followed by a short scan | ||||||||||||||||||
cbiffle marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
let mut scan_addr = slice.base_addr(); | ||||||||||||||||||
let end_addr = slice.end_addr(); | ||||||||||||||||||
|
||||||||||||||||||
for region in table { | ||||||||||||||||||
if region.contains(scan_addr) { | ||||||||||||||||||
// Make sure it's permissible! | ||||||||||||||||||
if !region_ok(region) { | ||||||||||||||||||
// bail to the fail handling code at the end. | ||||||||||||||||||
break; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
if end_addr <= region.end_addr() { | ||||||||||||||||||
// We've exhausted the slice in this region, we don't have | ||||||||||||||||||
// to continue processing. | ||||||||||||||||||
return true; | ||||||||||||||||||
} | ||||||||||||||||||
let Ok(index) = | ||||||||||||||||||
table.binary_search_by(|reg| region_compare(reg, scan_addr)) | ||||||||||||||||||
else { | ||||||||||||||||||
// No region contained the start address. | ||||||||||||||||||
return false; | ||||||||||||||||||
}; | ||||||||||||||||||
|
||||||||||||||||||
// Perform fast checks on the initial region. In practical testing this | ||||||||||||||||||
// provides a ~1% performance improvement over only using the loop below. | ||||||||||||||||||
let first_region = &table[index]; | ||||||||||||||||||
if !region_ok(first_region) { | ||||||||||||||||||
return false; | ||||||||||||||||||
} | ||||||||||||||||||
// Advance to the end of the first region | ||||||||||||||||||
scan_addr = first_region.end_addr(); | ||||||||||||||||||
if scan_addr >= end_addr { | ||||||||||||||||||
// That was easy | ||||||||||||||||||
Comment on lines
+198
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a super tiny, unimportant copyedit nit that i feel quite silly for commenting on: all the other comments in this function are terminated by periods, but these aren't (exclamation point optional):
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ¡but what if I want it at the front though |
||||||||||||||||||
return true; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
// Continue scanning at the end of this region. | ||||||||||||||||||
scan_addr = region.end_addr(); | ||||||||||||||||||
} else if region.base_addr() > scan_addr { | ||||||||||||||||||
// We've passed our target address without finding regions that | ||||||||||||||||||
// work! | ||||||||||||||||||
// Scan adjacent regions. | ||||||||||||||||||
for region in &table[index + 1..] { | ||||||||||||||||||
if !region.contains(scan_addr) { | ||||||||||||||||||
// We've hit a hole without finishing our scan. | ||||||||||||||||||
break; | ||||||||||||||||||
} | ||||||||||||||||||
// Make sure the region is permissible! | ||||||||||||||||||
if !region_ok(region) { | ||||||||||||||||||
// bail to the fail handling code at the end. | ||||||||||||||||||
break; | ||||||||||||||||||
Comment on lines
+209
to
214
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hum, the "fail handling code at the end" is just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually can't remember why I factored it this way. I could go either way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤷♀️ well, it's not terribly important to me --- i just went hunting for what the fail handling code at the end might be, and then, when I saw that it was just |
||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
if end_addr <= region.end_addr() { | ||||||||||||||||||
// This region contains the end of our slice! We made it! | ||||||||||||||||||
return true; | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
// Continue scanning at the end of this region. | ||||||||||||||||||
scan_addr = region.end_addr(); | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
// We reach this point by exhausting the region table, or finding a | ||||||||||||||||||
// region at a higher address than the slice. | ||||||||||||||||||
// We reach this point by exhausting the region table without reaching the | ||||||||||||||||||
// end of the slice, or hitting a hole. | ||||||||||||||||||
false | ||||||||||||||||||
} | ||||||||||||||||||
|
||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that, because the length of the task table is always
HUBRIS_TASK_COUNT
and task indices are always in bounds, we could probably shave off a few instructions (and an unreachable panic site) by usingget_unchecked
when indexing the task table here and elsewhere in the scheduler. I'm not sure whether this optimization is worth the unsafe code, but it's worth thinking about...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I hadn't realized that the number of tasks is actually known at the kernel build time as well.
Is there a reason for why that knowledge isn't more widely used in the kernel, actually? Specifically, why don't all these methods work with
&(mut) [Task; HUBRIS_TASK_COUNT]
references? I just tested this locally, and the changes needed to do that are quite minimal indeed. Would that cause eg. problems with debugging, compatibility, or code size?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably true but we'd want to be very careful about where we do it.
There's a historical reason why we don't use fixed-length arrays for the task table, but it's not really valid anymore. (The number of tasks was not originally available to the kernel at compile time.)
In practice, however, all of these functions get specialized by the compiler to fixed-length arrays. When disassembling the kernel you see a lot of
cmp r0, #6
and stuff for validation. Making the types explicit might be an interesting simplification -- relying on the compiler for stuff like this can be tricky.I'd prefer not to pile either of those changes on this PR if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entirely reasonable on both counts, we can think more about this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed: Since I already have the changes locally, I'll try my hand at figuring out the asm output at play and see if this helps LLVM or just duplicates its work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I couldn't find any difference in output (except my mind or different checked out commit from this PR playing tricks on me).
I can still open a PR, if you'd like: The full patch file is only 434 lines long, with 44 actual lines changed I think. Very small, guarantees the optimisation and it might have some effect somewhere, even though I was unable to find that place (I managed to only really analyse
select
andsafe_copy
.)