Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce stricter validation for data read from binary file #682

Merged
merged 5 commits into from
Oct 30, 2024

Conversation

jiangliu
Copy link
Contributor

@jiangliu jiangliu commented Oct 8, 2024

Enforce stricter validation for data read from binary file by:

  • limit maximum size of binary file
  • validate slice indices
  • use checked_add to avoid overflow
  • skip imported symbol
  • skip symbols with values out of range

@jiangliu jiangliu force-pushed the binary-parser-validation branch 3 times, most recently from 107eb97 to 4a4b714 Compare October 12, 2024 05:33
Copy link
Owner

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

Can you say a little bit about the problem you're having that this code is fixing?

src/utils.rs Outdated
Comment on lines 24 to 33
pub fn is_subrange_usize(start: usize, size: usize, sub_start: usize, sub_size: usize) -> bool {
size != 0
&& sub_size != 0
&& start.checked_add(size).is_some()
&& sub_start.checked_add(sub_size).is_some()
&& sub_start >= start
&& sub_start + sub_size <= start + size
}

pub fn is_subrange_u64(start: u64, size: u64, sub_start: u64, sub_size: u64) -> bool {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a generic option for these functions rather than have two versions that only differ by the datatype - but at a first glance, it doesn't seem like the 'checked_add' has a trait in the rust stdlib we can use =(. (though there is options like num_traits crate that could potentially work https://docs.rs/num-traits/0.2.14/num_traits/ops/checked/trait.CheckedAdd.html)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have tried to implement a generic function, but found we need to introduce an extra dependency on num_traits. If it's ok to depend on num_traits, I will give it a try:)

@@ -6,6 +6,10 @@ use anyhow::Error;
use goblin::Object;
use memmap::Mmap;

use crate::utils::{is_subrange_u64, is_subrange_usize};

const MAX_BINARY_FILE_SIZE: u64 = 0x80000000;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you hit a case where it was trying to load info from a binary thats bigger than 2GB that wasn't valid?

We currently support profiling from embedded python, which could in theory mean the main executable is quite large (for instance if you have an embedded python interpreter in an executable with large amounts of debug symbols or cuda code etc). While 2GB is quite a bit, I could see some valid cases where we are over this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it's just defensive:)
How about enlarge the limitation? Or just remove it?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't solving a problem you've hit before - lets just remove the filesize limitation .

The rest of the changes look good btw!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the file size check.

@jiangliu
Copy link
Contributor Author

thanks for the PR!

Can you say a little bit about the problem you're having that this code is fixing?

We are trying to use py-spy to dump stack trace of current process/thread instead of remote process, so we hope to harden py-spy to avoid invalid memory access. It would also help to protect py-spy in codedump mode when the coredump files are from untrusted sources:)

Enforce stricter validation for data read from binary file by:
- limit maximum size of binary file
- validate slice indices
- use checked_add to avoid overflow
- skip imported symbol
- skip symbols with values out of range

Signed-off-by: Jiang Liu <[email protected]>
@benfred benfred merged commit 6eef487 into benfred:master Oct 30, 2024
49 checks passed
@benfred benfred added the enhancement New feature or request label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants