-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Hyperlight KVM guest debugging using gdb #111
base: main
Are you sure you want to change the base?
Conversation
dblnz
commented
Dec 13, 2024
- The current implementation supports only 4 hardware breakpoints.
- There might be some bugs, I am still testing
- There are some modifications I plan on doing but shouldn't have big impact on this solution
be21b85
to
ca61def
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super excited to see this landing. Nice work! My feedback mostly consists of nits and questions.
<T as Target>::Error: | ||
std::fmt::Debug + Send + From<io::Error> + From<DebugMessage> + From<TryRecvError>, | ||
{ | ||
// TODO: Address multiple sandboxes scenario |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than have a code comment, please create a new issue in the repo to track the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that after I gather all the feedback as this might be addressed before merging
|
||
/// Translates the guest address to physical address | ||
fn translate_gva(&self, gva: u64) -> Result<u64, GdbTargetError> { | ||
// TODO: Properly handle errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you like to do based on this TODO?
A related nit, it would be nice to have some preservation of the data in the underlying error, so as the error bubbles up, we can determine the root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a left over comment from the development phase, I added some error handling in the meantime.
I would like to maybe get an idea as to how to redo the error handling to be easier to track where errors originated
@@ -88,7 +88,8 @@ impl HypervisorHandler { | |||
#[derive(Clone)] | |||
struct HvHandlerExecVars { | |||
join_handle: Arc<Mutex<Option<JoinHandle<Result<()>>>>>, | |||
shm: Arc<Mutex<Option<SandboxMemoryManager<GuestSharedMemory>>>>, | |||
#[allow(clippy::type_complexity)] // TODO: Change this type | |||
shm: Arc<Mutex<Option<Arc<Mutex<SandboxMemoryManager<GuestSharedMemory>>>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend on changing this prior to PR merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to change it.
Cloning SandboxMemoryManager
directly might be a better idea, but the GuestSharedMemory
type does not implement Clone, I need to check what implementing that would mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does KVMDriver
need a clone of SandboxMemoryManager or does a mutable borrow maybe suffice? GuestSharedMemory
was purposefully not Clone
because it was intended to reflect unique ownership at any time of the guest side access to the shared memory.
let mut target = HyperlightKvmSandboxTarget::new(mgr, vcpu_fd, entrypoint, hyp_conn); | ||
let _ = target | ||
.set_entrypoint_bp() | ||
.map_err(|_| new_error!("Cannot set entrypoint breakpoint"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there useful information in the underlying error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a KVM generated error, I need to think about how to propagate the errors
- it adds a function to spawn the GDB thread - adds an empty implementation for a gdb target that is to be used with the gdbstub crate - adds `gdb` feature that can be enabled at compile time Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- the `execution_variables` cover now the scenario the locks were meant for Signed-off-by: Doru Blânzeanu <[email protected]>
- VcpuFd is needed for register read/write and setting debug settings - MemoryManager is needed for read/write memory access Signed-off-by: Doru Blânzeanu <[email protected]>
- this avoids the guest being terminated for timeout Signed-off-by: Doru Blânzeanu <[email protected]>
- this is needed to be able to be notified when the vcpu is stopped and when to resume Signed-off-by: Doru Blânzeanu <[email protected]>
- the hypervisor signals the gdb thread with a message that the vcpu stopped and it waits for a signal to resume Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- add `KvmDebug` struct that abstracts the details of kvm guest debug and offers an simple API for setting breakpoints - add breakpoint at entrypoint Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- gdb debugger now stops at entrypoint and is able to read addresses, read registers and add or remove breakpoints Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- this will make it simpler for other hypervisors support to be added Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- add CI check using `gdb` feature Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
4a44b2e
to
16492de
Compare
Signed-off-by: Doru Blânzeanu <[email protected]>
@@ -88,7 +88,8 @@ impl HypervisorHandler { | |||
#[derive(Clone)] | |||
struct HvHandlerExecVars { | |||
join_handle: Arc<Mutex<Option<JoinHandle<Result<()>>>>>, | |||
shm: Arc<Mutex<Option<SandboxMemoryManager<GuestSharedMemory>>>>, | |||
#[allow(clippy::type_complexity)] // TODO: Change this type | |||
shm: Arc<Mutex<Option<Arc<Mutex<SandboxMemoryManager<GuestSharedMemory>>>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does KVMDriver
need a clone of SandboxMemoryManager or does a mutable borrow maybe suffice? GuestSharedMemory
was purposefully not Clone
because it was intended to reflect unique ownership at any time of the guest side access to the shared memory.
#[cfg(gdb)] | ||
match self.communication_channels.from_handler_rx.recv() { | ||
Ok(msg) => match msg { | ||
HandlerMsg::Error(e) => Err(e), | ||
HandlerMsg::FinishedHypervisorHandlerAction => Ok(()), | ||
}, | ||
Err(_) => Err(HyperlightError::HypervisorHandlerMessageReceiveTimedout()), | ||
} | ||
|
||
#[cfg(not(gdb))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling the timeout might be helpful for scenarios other than debugging too. Might want to put this under a different feature flag or even as a runtime option. Though, that's not too related to this PR, so maybe leaving it like this is OK for now 👍