Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrevLocation incorrect #6

Open
vanhauser-thc opened this issue Mar 10, 2020 · 3 comments
Open

PrevLocation incorrect #6

vanhauser-thc opened this issue Mar 10, 2020 · 3 comments

Comments

@vanhauser-thc
Copy link
Contributor

vanhauser-thc commented Mar 10, 2020

The basic block which writes to the coverage map also has to set the previous location.

There are two issues with how this is implemented in InsTrim:

  1. the algorithm for this in afl is:
     index = curr_location ^ (prev_location >> 1);
     map[index]++;

In the InsTrim code the right shift of the prev_location is never performed.

  1. the prev_location written is always a specific one and not the one that was actually the path.
    e.g.
    EntryBlock
    / | \
    A B C
    \ | /
    ExitBlock
    The writing to the map will happen in the ExitBlock, and the prev_location written will always be the ID of block A, and not depend on which actual path was taken.

In the code this is visible in the following line:

IRB.CreateStore(ConstantInt::get(Int32Ty, genLabel()), OldPrev);

Two fix both issues, in afl++ we removed that CreateStore() and added after IRB.CreateStore(Incr, MapPtrIdx);:

Value *Shr = IRB.CreateLShr(L, One32);
 IRB.CreateStore(Shr, OldPrev)->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

Note that this also needs a ConstantInt *One32 = ConstantInt::get(Int32Ty, 1); definition after IntegerType *Int32Ty ...

@pzread
Copy link
Contributor

pzread commented Mar 10, 2020

(I guess we might have different understandings on the instrumentation code) This is how we design the implementation:

Our instrumentation tracks the previous marked block ID ^ the incoming edge ID. So this line:

IRB.CreateStore(ConstantInt::get(Int32Ty, genLabel()), OldPrev);`

will (only) store an unique marked block ID at the end of each marked block. And this code block:

          Value *L = NULL;
          if (PI == PE) {
            L = ConstantInt::get(Int32Ty, genLabel());
          } else {
            auto *PN = PHINode::Create(Int32Ty, 0, "", &*BB.begin());
            DenseMap<BasicBlock *, unsigned> PredMap;
            for (auto PI = pred_begin(&BB), PE = pred_end(&BB);
                 PI != PE; ++PI
            ) {
              BasicBlock *PBB = *PI;
              auto It = PredMap.insert({PBB, genLabel()});
              unsigned Label = It.first->second;
              PN->addIncoming(ConstantInt::get(Int32Ty, Label), PBB);
            }
            L = PN;
          }

will generate an unique edge ID for each incoming edge of the marked block. Then we utilize LLVM PHINode to get the edge ID of incoming edge during execution, and xor the incoming edge ID with previous marked block ID, as below:

          Value *MapPtrIdx = IRB.CreateGEP(MapPtr,
                                           IRB.CreateXor(PrevLocCasted, L));

So for you question 2, the instrumentation is like the following figure:

Untitled Diagram (1)

We don't set the PrevLoc in this case (it remains as its initial value), but the instrument gadget on the exit block can use the incoming edge ID to distinguish the 3 different paths.

The reason why we still xor the the previous marked block ID (PrevLoc) is because in the following case, we need the PrevLoc to distinguish the green and purple paths. (red blocks are marked blocks)

Untitled Diagram (2)

For you question 1, we believe the reason why AFL needs to shift PrevLoc is because it uses PrevLoc ^ CurrentLoc to form the edge ID (like our incoming edge ID). And in the loop case, if there is a block pointing to itself, we need to shift PrevLoc; otherwise, it will be CurrentLoc^CurrentLoc=0, since PrevLoc==CurrentLoc in this case.

Notice that we don't initialize the PrevLoc at the beginning of each function, so it will carry the previous marked block ID from callee. This is intended, so that different call edges will also be tracked.

@vanhauser-thc
Copy link
Contributor Author

AFL does basic block instrumentation and the edge ID comes from xor'ing the values of two blocks.

In your instrumentation you are doing a path instrumentation (so even more than edge instrumentation). IMHO in your case the xor'ing is unnecessary, as your ID is the value that afl would calculate by loc ^ prev_loc >> 1.
So the result of what you get is not an entry of a path, but a path in relation to a previous visited basic block. Or am I wrong?

And in cases where you need prev_loc to calculcate an edge ID I don't understand why you use a prev_loc that isnt really a prev_loc if there are other instrumented paths in between. This has implications I am too tired to see ;)

So in my understanding you instrument in two different ways the same thing (and edge/path). this is what confuses me :)

(and yes, it is my understanding too that the >> 1 comes from handling loop blocks, however it has more implications when it comes to pc-guard or qemu coverage)

@pzread
Copy link
Contributor

pzread commented Mar 16, 2020

Yes, we need the previous visited *marked* block ^ edge ID, not only the edge ID (which corresponds to AFL's loc ^ prev_loc >> 1). This is just because in this example:

image

Only taking the edge ID is not enough to distinguish the paths from green arrow and from purple arrow. That's why we xor the previous visited marked block to get an unique ID for each of 4 paths in this example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants