Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Hyperlight KVM guest debugging using gdb #111

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

dblnz
Copy link
Contributor

@dblnz dblnz commented Dec 13, 2024

  • The current implementation supports only 4 hardware breakpoints.
  • There might be some bugs, I am still testing
  • There are some modifications I plan on doing but shouldn't have big impact on this solution

@dblnz dblnz added the enhancement New feature or request label Dec 13, 2024
@dblnz dblnz self-assigned this Dec 13, 2024
@dblnz dblnz marked this pull request as draft December 13, 2024 17:54
@dblnz dblnz force-pushed the gdb-support-latest branch 2 times, most recently from be21b85 to ca61def Compare December 19, 2024 11:41
Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super excited to see this landing. Nice work! My feedback mostly consists of nits and questions.

docs/how-to-debug-a-hyperlight-guest.md Outdated Show resolved Hide resolved
docs/how-to-debug-a-hyperlight-guest.md Outdated Show resolved Hide resolved
docs/how-to-debug-a-hyperlight-guest.md Outdated Show resolved Hide resolved
src/hyperlight_host/src/hypervisor/gdb/mod.rs Outdated Show resolved Hide resolved
<T as Target>::Error:
std::fmt::Debug + Send + From<io::Error> + From<DebugMessage> + From<TryRecvError>,
{
// TODO: Address multiple sandboxes scenario
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than have a code comment, please create a new issue in the repo to track the feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that after I gather all the feedback as this might be addressed before merging


/// Translates the guest address to physical address
fn translate_gva(&self, gva: u64) -> Result<u64, GdbTargetError> {
// TODO: Properly handle errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you like to do based on this TODO?

A related nit, it would be nice to have some preservation of the data in the underlying error, so as the error bubbles up, we can determine the root cause.

Copy link
Contributor Author

@dblnz dblnz Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a left over comment from the development phase, I added some error handling in the meantime.
I would like to maybe get an idea as to how to redo the error handling to be easier to track where errors originated

@@ -88,7 +88,8 @@ impl HypervisorHandler {
#[derive(Clone)]
struct HvHandlerExecVars {
join_handle: Arc<Mutex<Option<JoinHandle<Result<()>>>>>,
shm: Arc<Mutex<Option<SandboxMemoryManager<GuestSharedMemory>>>>,
#[allow(clippy::type_complexity)] // TODO: Change this type
shm: Arc<Mutex<Option<Arc<Mutex<SandboxMemoryManager<GuestSharedMemory>>>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend on changing this prior to PR merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to change it.
Cloning SandboxMemoryManager directly might be a better idea, but the GuestSharedMemory type does not implement Clone, I need to check what implementing that would mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does KVMDriver need a clone of SandboxMemoryManager or does a mutable borrow maybe suffice? GuestSharedMemory was purposefully not Clone because it was intended to reflect unique ownership at any time of the guest side access to the shared memory.

let mut target = HyperlightKvmSandboxTarget::new(mgr, vcpu_fd, entrypoint, hyp_conn);
let _ = target
.set_entrypoint_bp()
.map_err(|_| new_error!("Cannot set entrypoint breakpoint"))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there useful information in the underlying error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a KVM generated error, I need to think about how to propagate the errors

dblnz added 22 commits January 22, 2025 17:16
- it adds a function to spawn the GDB thread
- adds an empty implementation for a gdb target that is to be used
with the gdbstub crate
- adds `gdb` feature that can be enabled at compile time

Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- the `execution_variables` cover now the scenario the locks were meant
  for

Signed-off-by: Doru Blânzeanu <[email protected]>
- VcpuFd is needed for register read/write and setting debug settings
- MemoryManager is needed for read/write memory access

Signed-off-by: Doru Blânzeanu <[email protected]>
- this avoids the guest being terminated for timeout

Signed-off-by: Doru Blânzeanu <[email protected]>
- this is needed to be able to be notified when the vcpu is stopped
and when to resume

Signed-off-by: Doru Blânzeanu <[email protected]>
- the hypervisor signals the gdb thread with a message that the vcpu
  stopped and it waits for a signal to resume

Signed-off-by: Doru Blânzeanu <[email protected]>
- add `KvmDebug` struct that abstracts the details of kvm guest debug
  and offers an simple API for setting breakpoints
- add breakpoint at entrypoint

Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- gdb debugger now stops at entrypoint and is able to read addresses,
  read registers and add or remove breakpoints

Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- this will make it simpler for other hypervisors support to be added

Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
- add CI check using `gdb` feature

Signed-off-by: Doru Blânzeanu <[email protected]>
Signed-off-by: Doru Blânzeanu <[email protected]>
@dblnz dblnz force-pushed the gdb-support-latest branch from 4a44b2e to 16492de Compare January 22, 2025 15:40
@dblnz dblnz marked this pull request as ready for review January 22, 2025 16:07
@dblnz dblnz changed the title Initial support for gdb debugging in Hyperlight KVM guest Add support for Hyperlight KVM guest debugging using gdb Jan 22, 2025
@@ -88,7 +88,8 @@ impl HypervisorHandler {
#[derive(Clone)]
struct HvHandlerExecVars {
join_handle: Arc<Mutex<Option<JoinHandle<Result<()>>>>>,
shm: Arc<Mutex<Option<SandboxMemoryManager<GuestSharedMemory>>>>,
#[allow(clippy::type_complexity)] // TODO: Change this type
shm: Arc<Mutex<Option<Arc<Mutex<SandboxMemoryManager<GuestSharedMemory>>>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does KVMDriver need a clone of SandboxMemoryManager or does a mutable borrow maybe suffice? GuestSharedMemory was purposefully not Clone because it was intended to reflect unique ownership at any time of the guest side access to the shared memory.

Comment on lines +548 to +557
#[cfg(gdb)]
match self.communication_channels.from_handler_rx.recv() {
Ok(msg) => match msg {
HandlerMsg::Error(e) => Err(e),
HandlerMsg::FinishedHypervisorHandlerAction => Ok(()),
},
Err(_) => Err(HyperlightError::HypervisorHandlerMessageReceiveTimedout()),
}

#[cfg(not(gdb))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling the timeout might be helpful for scenarios other than debugging too. Might want to put this under a different feature flag or even as a runtime option. Though, that's not too related to this PR, so maybe leaving it like this is OK for now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants