PrevLocation incorrect #6

vanhauser-thc · 2020-03-10T06:32:15Z

The basic block which writes to the coverage map also has to set the previous location.

There are two issues with how this is implemented in InsTrim:

the algorithm for this in afl is:

     index = curr_location ^ (prev_location >> 1);
     map[index]++;

In the InsTrim code the right shift of the prev_location is never performed.

the prev_location written is always a specific one and not the one that was actually the path.
e.g.
EntryBlock
/ | \
A B C
\ | /
ExitBlock
The writing to the map will happen in the ExitBlock, and the prev_location written will always be the ID of block A, and not depend on which actual path was taken.

In the code this is visible in the following line:

IRB.CreateStore(ConstantInt::get(Int32Ty, genLabel()), OldPrev);

Two fix both issues, in afl++ we removed that CreateStore() and added after IRB.CreateStore(Incr, MapPtrIdx);:

Value *Shr = IRB.CreateLShr(L, One32);
 IRB.CreateStore(Shr, OldPrev)->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

Note that this also needs a ConstantInt *One32 = ConstantInt::get(Int32Ty, 1); definition after IntegerType *Int32Ty ...

The text was updated successfully, but these errors were encountered:

pzread · 2020-03-10T22:39:32Z

(I guess we might have different understandings on the instrumentation code) This is how we design the implementation:

Our instrumentation tracks the previous marked block ID ^ the incoming edge ID. So this line:

IRB.CreateStore(ConstantInt::get(Int32Ty, genLabel()), OldPrev);`

will (only) store an unique marked block ID at the end of each marked block. And this code block:

          Value *L = NULL;
          if (PI == PE) {
            L = ConstantInt::get(Int32Ty, genLabel());
          } else {
            auto *PN = PHINode::Create(Int32Ty, 0, "", &*BB.begin());
            DenseMap<BasicBlock *, unsigned> PredMap;
            for (auto PI = pred_begin(&BB), PE = pred_end(&BB);
                 PI != PE; ++PI
            ) {
              BasicBlock *PBB = *PI;
              auto It = PredMap.insert({PBB, genLabel()});
              unsigned Label = It.first->second;
              PN->addIncoming(ConstantInt::get(Int32Ty, Label), PBB);
            }
            L = PN;
          }

will generate an unique edge ID for each incoming edge of the marked block. Then we utilize LLVM PHINode to get the edge ID of incoming edge during execution, and xor the incoming edge ID with previous marked block ID, as below:

          Value *MapPtrIdx = IRB.CreateGEP(MapPtr,
                                           IRB.CreateXor(PrevLocCasted, L));

So for you question 2, the instrumentation is like the following figure:

We don't set the PrevLoc in this case (it remains as its initial value), but the instrument gadget on the exit block can use the incoming edge ID to distinguish the 3 different paths.

The reason why we still xor the the previous marked block ID (PrevLoc) is because in the following case, we need the PrevLoc to distinguish the green and purple paths. (red blocks are marked blocks)

For you question 1, we believe the reason why AFL needs to shift PrevLoc is because it uses PrevLoc ^ CurrentLoc to form the edge ID (like our incoming edge ID). And in the loop case, if there is a block pointing to itself, we need to shift PrevLoc; otherwise, it will be CurrentLoc^CurrentLoc=0, since PrevLoc==CurrentLoc in this case.

Notice that we don't initialize the PrevLoc at the beginning of each function, so it will carry the previous marked block ID from callee. This is intended, so that different call edges will also be tracked.

vanhauser-thc · 2020-03-11T08:08:11Z

AFL does basic block instrumentation and the edge ID comes from xor'ing the values of two blocks.

In your instrumentation you are doing a path instrumentation (so even more than edge instrumentation). IMHO in your case the xor'ing is unnecessary, as your ID is the value that afl would calculate by loc ^ prev_loc >> 1.
So the result of what you get is not an entry of a path, but a path in relation to a previous visited basic block. Or am I wrong?

And in cases where you need prev_loc to calculcate an edge ID I don't understand why you use a prev_loc that isnt really a prev_loc if there are other instrumented paths in between. This has implications I am too tired to see ;)

So in my understanding you instrument in two different ways the same thing (and edge/path). this is what confuses me :)

(and yes, it is my understanding too that the >> 1 comes from handling loop blocks, however it has more implications when it comes to pc-guard or qemu coverage)

pzread · 2020-03-16T02:03:03Z

Yes, we need the previous visited *marked* block ^ edge ID, not only the edge ID (which corresponds to AFL's loc ^ prev_loc >> 1). This is just because in this example:

Only taking the edge ID is not enough to distinguish the paths from green arrow and from purple arrow. That's why we xor the previous visited marked block to get an unique ID for each of 4 paths in this example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrevLocation incorrect #6

PrevLocation incorrect #6

vanhauser-thc commented Mar 10, 2020 •

edited

Loading

pzread commented Mar 10, 2020 •

edited

Loading

vanhauser-thc commented Mar 11, 2020

pzread commented Mar 16, 2020 •

edited

Loading

PrevLocation incorrect #6

PrevLocation incorrect #6

Comments

vanhauser-thc commented Mar 10, 2020 • edited Loading

pzread commented Mar 10, 2020 • edited Loading

vanhauser-thc commented Mar 11, 2020

pzread commented Mar 16, 2020 • edited Loading

vanhauser-thc commented Mar 10, 2020 •

edited

Loading

pzread commented Mar 10, 2020 •

edited

Loading

pzread commented Mar 16, 2020 •

edited

Loading