Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access out-of-bounds data #208

Open
henrygab opened this issue Feb 24, 2025 · 6 comments
Open

Access out-of-bounds data #208

henrygab opened this issue Feb 24, 2025 · 6 comments

Comments

@henrygab
Copy link
Contributor

henrygab commented Feb 24, 2025

This code effectively accesses the data at i and at i+1:

picotool/bintool/metadata.h

Lines 257 to 261 in de8ae5a

if (new_p.flags & PICOBIN_PARTITION_FLAGS_HAS_ID_BITS) {
uint32_t low = data[i++];
uint32_t high = data[i++];
new_p.id = (uint64_t)low | ((uint64_t)high << 32);
}

However, the code does not ensure that data[1] will be valid:

picotool/bintool/metadata.h

Lines 237 to 242 in de8ae5a

std::vector<uint32_t> data;
for (unsigned int i=2; i < size; i++) {
data.push_back(*it++);
}
size_t i=0;
while (i < data.size()) {

From https://en.cppreference.com/w/cpp/container/vector/operator_at:

No bounds checking is performed.

Therefore, no guarantee that this will throw an exception (which might be expected), and as a result, clearly getting into undefined behavior.....

@henrygab
Copy link
Contributor Author

henrygab commented Feb 25, 2025

See next post....

old thoughts

To fix, verify that data.size() is sufficient before setting high.

For example:

if (new_p.flags & PICOBIN_PARTITION_FLAGS_HAS_ID_BITS) { 
    if ((data.size() - 1u) < i) {
        // handle this error condition properly ... don't access memory that's not ensured to be valid!
    }
    uint32_t low = data[i++]; 
    uint32_t high = data[i++]; 
    new_p.id = (uint64_t)low | ((uint64_t)high << 32); 
 } 

@henrygab
Copy link
Contributor Author

Actually, the code is much, MUCH worse that first anticipated. It's constantly incrementing i throughout, without then validating that data.size() < i before later access of data. This code is not following best practices, and seems worth having a senior developer fix many data-access bugs.

@ndabas
Copy link

ndabas commented Feb 25, 2025

(Disclaimer: I am not a maintainer or author of this repo)

The code is checking the appropriate flags to ensure that the data accesses are valid.

I think it would be constructive for you to:

  • Demonstrate a failure case (perhaps with invalid data input) where this code would actually do the wrong thing;
  • Provide a PR fixing those issues.

Note that even if the code fails on invalid data -- a program crash would be completely expected, and also completely harmless.

@henrygab
Copy link
Contributor Author

Note that even if the code fails on invalid data -- a program crash would be completely expected, and also completely harmless.

I must respectfully disagree with you on multiple levels.

I see you're a seasoned developer. At the same time, you appear to presume UB means the program will crash. Therefore, it's possible that you've been spared the agony of debugging something caused by compilers optimizing around UB....

Unfortunately, C and C++ are much more dangerous, and explicitly allow that, when UB occurs, the compiler can literally choose to do anything it wants. If you're not familiar with the perils of UB, ... welcome to the deep rabbit hole of debugging issues that appear to make no sense! Take some time to enjoy the horror stories, and learn without paying the terrible costs that UB optimizations have inflicted upon many....

As to this template: It does not place restrictions upon its template arguments, not perform any validation of those arguments (and thus ensure at least a clean exception is thrown). Let's see what happens with the following template arguments:

// Iterator points to following uint32_t values:
{ 0xXXXXXXXXu, 0xFFFFFFFFu, 0xFFFFFFFFu } // first value doesn't even matter for these examples....
uint32_t header = 0x00000300u; // aka 0x3u << 8

I'll walk through the code. Since we're both seasoned developers, I'm hoping it's enough to show a few areas of UB. Feel free to point out errors, as this is currently via code review:

// lines 229-235
uint32_t size = 0x00000003u; // result of decode_size(0x00000300u)
uint8_t singleton_count = 0x00u; // (header >> 24)
bool singleton = false;  // header & 0x80000000u;
uint8_t partition_count = 0x00u; // (header >> 24) & 0xFu;
uint32_t unpartitioned_flags = 0xXXXXXXXXu;
// Iterator now points to second element, value 0xFFFFFFFFu
pt = shared_pointer(0xXXXXXXXXu, singleton)

// lines 237-240 
// * here, if header was 0x00007700u, 0x75u values (access of most of which are UB) into the vector
// * here, since size was 3u, vector will initialize with the single next value from iterator parameter
std::vector<uint32_t> data = { 0xFFFFFFFFu }; 

// then, because data.size() == 1, and i is initialized to zero, the while() loop at 242 is entered.
i = 0
// line 244
//  ... = data[i++]; 
uint32_t permissions_locations = 0xFFFFFFFFu;
i = 1;
// line 249 results in UNDEFINED BEHAVIOR
//          per the C++ standard, no bounds checking on std::vector<>
// ... = data[i++]; // OOPS!  Vector doesn't have enough data....
permissions_flags = UNDEFINED BEHAVIOR

The UB potential continues from there. If you come to the understanding that UB is never acceptable, then at best the code here is extremely fragile with many unwritten and untested assumptions. If the code used a container that did bounds checking, then at least there would be the assurance of a proper exception being thrown. As it is, there is no such guarantee.

How long do you think it would take an average developer, when first presented with this function, to understand and explain what this code is actually doing in the normal case? How long until that developer could describe the function's unwritten assumptions? Can you enumerate those assumptions?

Fundamentally, C++ is designed to use exceptions. UB is not an exception. Code that allows for UB is arguably still OK in one-off, unsupported code that is discarded. Where that code is shared and forked and potentially used in other places ... then UB is strictly persona non grata, yes?

@ndabas
Copy link

ndabas commented Feb 27, 2025

Thanks for the detailed write-up and explanation.

I understand what UB is. All I was trying to say was that this isn't a case of UB:

  • The bounds check and access using data[..] are function calls, not direct array accesses.
  • So it would be UB if the actual index access is out of range, which I'm asserting is not actually the case.

In any case -- it's trivial to add the required bounds checks -- so why not create a PR and do that? (Again, I'm not an author or maintainer on this repo, so I cannot say whether they would consider merging it.)

@henrygab
Copy link
Contributor Author

henrygab commented Feb 27, 2025

  • So it would be UB if the actual index access is out of range, which I'm asserting is not actually the case.

Can you help me understand your assertion?

  • Above, the std::vector<> has a single element.
  • A single element means that the only valid index is zero.
  • Thus, data[0] is valid, while data[N] is invalid unless N is zero.
  • The variable i is used to access data multiple times as data[i++].
  • Specifically, i is shown to be non-zero while executing data[i++].
  • Therefore, those are index accesses that are out-of-range.
  • [EDIT] and ... From https://en.cppreference.com/w/cpp/container/vector/operator_at, std:vector<>'s [] operator: "Accessing a nonexistent element through this operator is undefined behavior."

Therefore, can you help me understand how you assert that the index access is NOT out of range?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants