How to encode branches while letting the encoder choose the branch type and width? #345

jonomango · 2022-06-05T06:37:22Z

jonomango
Jun 5, 2022

I'm currently using Zydis for binary rewriting (awesome library btw!) and I'm not sure how to go about dealing with branches. Specifically, I've realized just how annoying it is to calculate the offset for relative instructions without knowing the final size of the instruction beforehand. I think that this comment sums up the issue pretty well and I was wondering what the common approach was when dealing with this. I really want to avoid having to manually decide the branch type and width, but it just doesn't seem possible if there is no way to accurately predict the size of the encoded instruction.

It would be pretty nice to have a flag of some sort that lets relative operands be relative to the start of the current instruction, rather than the end, while letting the encoder deal with all the nasty calculations.

Answered by jonomango

Jun 10, 2022

For anyone else in a similar situation, this is how I ended up resolving the issue:

// Try to re-encode a relative branch instruction with a new delta value.
// This new value is relative to the start of the instruction, rather
// than the end. The function returns false if it is unable to fit the
// new delta value in a relative instruction.
static bool reencode_relative_branch(
    ZydisDecodedInstruction const&   decoded_instruction,
    ZydisDecodedOperand const* const decoded_operands,
    std::int64_t               const delta,
    std::uint8_t*              const buffer,
    std::size_t&                     length) {
  // make sure we're dealing with a relative branch...
  assert(d…

View full answer

mappzor · 2022-06-05T22:18:21Z

mappzor
Jun 5, 2022

I was wondering what the common approach was when dealing with this.

There are two:

go with rel32 branches everywhere to reduce complexity
generate optimal branches (2 pass assembly, you would need to optimize branches during final pass)

I don't know what you are doing exactly but "binary rewriting" part tells me already that you are likely to work on higher-level primitives such as basic blocks and symbolic destinations (labels) that will be resolved in second pass anyway.

Those things are out of scope for Zydis as it's a lowest-level library. Just decoding and encoding is supported. However you should take a look at zasm. It supports labels. I'm not sure where it stands right with size-optimal branches but you should probably check it out as it will make your life easier.

0 replies

jonomango · 2022-06-06T00:14:01Z

jonomango
Jun 6, 2022
Author

Thank you for the quick reply! I really appreciate it.

generate optimal branches (2 pass assembly, you would need to optimize branches during final pass)

This is my first time writing something like this, but this is basically what I'm doing. For the initial pass, I'm copying every instruction from the binary into memory. Branches and other relative instructions need to be disassembled and handled specially. Relative instructions that reference code that has already been written can be encoded immediately with the optimal branch size (i.e. backward jmps/calls). For instructions that reference code that has yet to be written, however, I essentially just calculate the worst-case offset and temporarily encode the instruction with that. Once the "real" offset becomes known, I go back and fix these instructions with the correct one.

So essentially, I've already managed to calculate the relative offset for every relative instruction and I'm trying to get Zydis to choose the optimal branch type and width for the provided offset. I may just end up writing a crude check for whether the offset can be encoded with rel8 vs rel32 and calculating the instruction length from that. My only concern is that there could be other things that affect the length that I'm not aware of (i.e. instruction prefixes). Since I'm decoding -> encoding, I would like to preserve the semantics of the original instruction while modifying the relative-offset, so it isn't possible for me to just ignore instruction prefixes and whatnot (although it would make things a lot easier!).

I don't know what you are doing exactly but "binary rewriting" part tells me already that you are likely to work on higher-level primitives such as basic blocks and symbolic destinations (labels) that will be resolved in second pass anyway.

Those things are out of scope for Zydis as it's a lowest-level library. Just decoding and encoding is supported. However you should take a look at zasm. It supports labels. I'm not sure where it stands right with size-optimal branches but you should probably check it out as it will make your life easier.

I may have given the wrong impression about my project, but I'm really not at that high of a level. I'm essentially just dealing with a stream of instructions that I'm decoding and writing to memory. I have no concept of labels (or any other symbols for that matter). The only tricky part is where these instructions get written to will be completely different from where they were original meant to be, and I need to account for these differences. This should give a general outline of what I'm trying to achieve, although my implementation will support inserting/modifying instructions, as well as using optimal relative instructions whenever possible.

It is my first time hearing about zasm and it looks promising, but I don't think it supports my use-case (and it's a bit too high-level for what I'm doing). Zydis is perfect for my goals so far, especially with how easy it is to create and modify an encoder request from a decoded instruction. I'll take a look at how zasm deals with encoding branches and see if I can get any inspiration from that.

2 replies

jonomango Jun 6, 2022
Author

It seems that this is how zasm deals with relative branches, which fits my needs very well and is probably what I'll end up doing. I'm still not sure how they're calculating encodeSizeRel8 and encodeSizeRel32 though, since I can't seem to find where they're being set inside of zasm.

ZehMatt Jun 11, 2022

Sorry I didn't notice this discussion, the info is obtained here https://github.com/zyantific/zasm/blob/82041570988a9200d452b66cd9aea72284d90fcf/src/zasm/src/encoder/encoder.cpp#L44

jonomango · 2022-06-10T11:17:46Z

jonomango
Jun 10, 2022
Author

For anyone else in a similar situation, this is how I ended up resolving the issue:

// Try to re-encode a relative branch instruction with a new delta value.
// This new value is relative to the start of the instruction, rather
// than the end. The function returns false if it is unable to fit the
// new delta value in a relative instruction.
static bool reencode_relative_branch(
    ZydisDecodedInstruction const&   decoded_instruction,
    ZydisDecodedOperand const* const decoded_operands,
    std::int64_t               const delta,
    std::uint8_t*              const buffer,
    std::size_t&                     length) {
  // make sure we're dealing with a relative branch...
  assert(decoded_instruction.attributes & ZYDIS_ATTRIB_IS_RELATIVE);
  assert(decoded_instruction.meta.branch_type != ZYDIS_BRANCH_TYPE_NONE);

  // also make sure we're not accessing any memory (i.e. call [rax])
  assert(!(decoded_instruction.attributes & ZYDIS_ATTRIB_HAS_MODRM));

  ZydisEncoderRequest encoder_request;

  // create an encoder request from the decoded instruction
  auto status = ZydisEncoderDecodedInstructionToEncoderRequest(
    &decoded_instruction, decoded_operands,
    decoded_instruction.operand_count_visible, &encoder_request);
  assert(ZYAN_SUCCESS(status));

  auto const is_jmp  = decoded_instruction.meta.category == ZYDIS_CATEGORY_UNCOND_BR;
  auto const is_jcc  = decoded_instruction.meta.category == ZYDIS_CATEGORY_COND_BR;
  auto const is_call = !is_jmp && !is_jcc;

  // this is the number of prefixes that the new instruction will have
  // (which may be different from the original decoded instruction!).
  std::int64_t const prefix_count = __popcnt64(encoder_request.prefixes);

  std::int64_t predicted_instruction_length = 0;

  // only JMPs/JCCs may be encoded as rel8
  if (!is_call && std::abs(delta - (prefix_count + 2)) <= 0x7FLL) {
    encoder_request.branch_type = ZYDIS_BRANCH_TYPE_SHORT;
    predicted_instruction_length = prefix_count + 2;
  }
  // both JMPs and CALLs may be encoded as rel32
  else if (!is_jcc && std::abs(delta - (prefix_count + 5)) <= 0x7FFF'FFFFLL) {
    encoder_request.branch_type = ZYDIS_BRANCH_TYPE_NEAR;
    predicted_instruction_length = prefix_count + 5;
  }
  // JCCs may also be encoded as rel32, however, they use an additional byte
  else if (is_jcc && std::abs(delta - (prefix_count + 6)) <= 0x7FFF'FFFFLL) {
    encoder_request.branch_type = ZYDIS_BRANCH_TYPE_NEAR;
    predicted_instruction_length = prefix_count + 6;
  }
  // if we reach here, it means the delta was too
  // large to encode as a relative instruction.
  else
    return false;

  assert(encoder_request.operand_count == 1);
  assert(encoder_request.operands[0].type == ZYDIS_OPERAND_TYPE_IMMEDIATE);

  // adjust and apply the new delta value, now that we know the instruction length
  encoder_request.operands[0].imm.s = delta - predicted_instruction_length;

  // i dont really know what this field is for, so let the encoder choose
  encoder_request.branch_width = ZYDIS_BRANCH_WIDTH_NONE;

  status = ZydisEncoderEncodeInstruction(&encoder_request, buffer, &length);

  assert(ZYAN_SUCCESS(status));
  assert(predicted_instruction_length == length);

  return true;
}

I'm aware that this isn't a very elegant solution, but this seems to handle everything that I've encountered so far. I believe that, in its current state, encoding relative instructions with Zydis is practically impossible. This stems from the fact that there is no way to predict the resulting length of an encoded instruction, which means that it is impossible to calculate RIP-relative offsets. Even if you know which branch type you'll be using ahead of time, you still need to account for many other factors that have an effect on the instruction length (prefixes, near jmp vs near jcc, etc). And, if it turns out that you know exactly which instruction you are dealing with, including prefixes and branch type, then you might as well just emit the raw bytes yourself since you have enough information and using an encoder would just be slower. I just can't find a situation where it is possible to use the Zydis encoder to encode a branch instruction (or really any relative instruction for that matter).

Sorry if this seems like a rant, I'm just a bit annoyed that I couldn't find a nice solution to my problem :P.

3 replies

athre0z Jun 12, 2022
Maintainer

So basically we had the two use-cases in mind that @mappzor already mentioned previously when deciding not to provide start-of-instruction/absolute-address branches in the initial implementation of the encoder: either you just emit rel32 everywhere and do a simple offset fix-up pass in the end, or you roll an optimization algorithm that produces optimal short branches. For the latter, you'd have to hand-roll the instruction length calculation anyway since the optimization algo will need information for answering "which delta between BBs will result in which instruction length" efficiently, to guide the optimization process.

Your approach is somewhere in between the two, which is something that we didn't really consider in the original design. We'll discuss this further internally and will hopefully come up with some utils / helper functions to make this easier in the future.

flobernd Jun 12, 2022
Maintainer

@jonomango I agree that it's not intuitive for the user to do calculations based on the end of the instruction rather than the start offset. As @athre0z mentioned, we will discuss internally and come up with a solution.

That being said, the whole topic is not trivial and sadly can not be solved in a generic way by Zydis. In our hook library, I implemented a basic code relocation algorithm (https://github.com/zyantific/zyan-hook-engine/blob/master/src/Relocation.c#L497) which as well takes care of rewriting relative branch instructions, if they can not reach their original target from the new position. The code was written at a time when the encoder was not available yet, but my point is the actual rewriting that needs to happen:

Enlarge 8-bit jumps to their 32-bit counterparts
Completely rewrite LOOP{E|NE} instructions (as there are no 32-bit versions for this)

It's mandatory to have a second pass (https://github.com/zyantific/zyan-hook-engine/blob/master/src/Relocation.c#L791) in these cases to compensate for the changed instruction/block size (it will fix up instructions with targets inside of the relocated block).

I'm "lucky" that the hook library only relocates a tiny bit of code so that any short branch with a destination inside of the relocated block can always reach the target. Otherwise I would have choosen the "always emit near branches" approach. Generating optimal size branches in such scenarios requires heavy backtracking (or recursion), because for each branch you encounter, you have to first optimize every other branch between start and destination:

00 jmp 12 ; we can not optimize this, before first optimizing all branches between 00 and 12
05 nop
06 jmp 11 ; we can not optimize this, before first optimizing all branches between 06 and 11 (if there are any)
11 nop
12 nop

Btw.: Interesting project you are working on 🙂

jonomango Jun 13, 2022
Author

Yeah, I've realized that this is a lot more complicated than it might initially seem. In fact, I didn't even realize that I had to account for the loop instruction, although I'm not sure how common it actually is in real-world binaries. I would definitely appreciate it if there was some more support for this in Zydis itself, although I also understand completely if you feel that it might be out of scope for what Zydis is advertised to be.

I also can't really encode everything as rel32 and then fix them up in a second pass since it's possible (and actually very likely in my project!) that some offsets are bigger than 32-bit. These cases need to be replaced with a special sequence of instructions to emulate an absolute call/jmp and it takes A LOT more bytes than something like a rel8 jmp. This is pretty much the algorithm that I'm using to produce close-to-optimal branches with just 2 passes, and it works surprisingly well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to encode branches while letting the encoder choose the branch type and width? #345

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to encode branches while letting the encoder choose the branch type and width? #345

jonomango Jun 5, 2022

Replies: 3 comments · 5 replies

mappzor Jun 5, 2022

jonomango Jun 6, 2022 Author

jonomango Jun 6, 2022 Author

ZehMatt Jun 11, 2022

jonomango Jun 10, 2022 Author

athre0z Jun 12, 2022 Maintainer

flobernd Jun 12, 2022 Maintainer

jonomango Jun 13, 2022 Author

jonomango
Jun 5, 2022

Replies: 3 comments 5 replies

mappzor
Jun 5, 2022

jonomango
Jun 6, 2022
Author

jonomango Jun 6, 2022
Author

jonomango
Jun 10, 2022
Author

athre0z Jun 12, 2022
Maintainer

flobernd Jun 12, 2022
Maintainer

jonomango Jun 13, 2022
Author