exampleAlgorithm.tex

\chapter{Example Algorithm} \label{Algorithm}

An example algorithm for compressed branch trace is given in figure~\ref{fig:algo}. 
In the diagram, the following terms are used:

\begin{itemize}
  \item \textit{Qualified?}  An instruction that meets the filtering criteria is qualified, and will be traced;
  \item \textit{Branch?} Is the instruction a branch or not (\textbf{itype} values 4 or 5, or a non-zero \textbf{ntkn});
  \item \textit{branch map.}  A vector where each bit represents the outcome of a branch.  A 0 indicates the
    branch was taken, a 1 indicates that it was not;
  \item \textit{inst.}  Abbreviation for 'instruction';
  \item \textit{resync count.} A counter used to keep track of when it is necessary to send 
    a synchronization packet (see Section~\ref{synchronization}, final bullet). The exact mechanism for 
    incrementing this counter are not specified, but options might be to count the number of \textit{te\_inst} packets emitted, 
    or the number of clock cycles elapsed since the last synchronization message was sent;
  \item \textit{max\_resync.}  The resync counter value that schedules a synchronization packet;
  \item \textit{updiscon.}  Uninferable PC disconinuity.  This identifies an instruction that
    causes the program counter to be changed by an amount that cannot be predicted from the
    source code alone (\textbf{itype} values 8, 10, 12 or 14);
   \item \textit{te\_inst.} The name of the packet type emitted by the encoder (see Chapter~\ref{packets});
   \item \textit{e\_ccd.} An exception has been signalled, or context has changed and
     should be treated as an uninferable PC discontinuity (see Table~\ref{tab:context-type});
   \item \textit{ppch.} Privilege has changed, or context has changed and needs to be 
     reported precisely (see Table~\ref{tab:context-type});
   \item \textit{ppch\_br.} As above, but branch map not empty;
   \item \textit{resync\_br.} The resync counter has reached the maximum value and there are
     entries in the branch map that have not yet been output.  These must be output before
     the subsequent synchronization packet, which does not report branch map history;
   \item \textit{er\_ccdn.}  Instruction retirement and exception signalled on the same cycle, 
     or context has changed and should be treated as an uninferable PC discontinuity, or
     context notify (see Table~\ref{tab:context-type});
   \item \textit{exc\_only.}  Exception signaled without simultaneous retirement;
   \item \textit{cci.}  context change that can be reported imprecisely (see Table~\ref{tab:context-type}).
\end{itemize}

\begin{figure}[l]
\begin{center}
  \includegraphics[height=23cm, width=15cm]{algo.png}
  \caption{Delta Mode 1 instruction trace algorithm}
  \label{fig:algo}
\end{center}
\end{figure}

Figure~\ref{fig:algo} shows instruction by instruction behavior, as would be
seen in a single-retirement system only.  Whilst the ingress port allows the RISC-V core to
provide information on multiple retiring instructions simultaneously, the resultant packet
sequence generated by the encoder must be the same as if retiring one instruction at a time.

A 3-stage pipeline is assumed, such that the encoder has 
visibility of the current, previous and next instructions.  All packets are generated using 
information relating to the current instruction.  The orange diamonds indicate decisions 
based on the previous (or last) instruction, the green diamond indicates a decision based on the
next instruction, and all other diamonds are based on the current instruction.

Additionally, the encoder can generate one further packet type, not shown on the diagram for 
clarity.  The \textit{support} packet (format 3, subformat 3 - see Chapter~\ref{packets}) is 
sent when:

\begin{itemize}
  \item The encoder is enabled or disabled, or its configuration is changed, 
    to inform the decoder of the operating mode of the encoder
  \item After the last qualified instruction has been traced, to inform the decoderthat 
    tracing has stopped;
  \item If trace packets are lost (for example if the buffer into which packets are being 
    written fills up.  In this situation, the 1st packet 
    loaded into the buffer when space next becomes available should be a \textit{support} 
    packet.  Following this, tracing will resume with a sync packet.
\end{itemize}

Note: if the \textbf{halted} or \textbf{reset} sideband signals are asserted (see Table~\ref{tab:ingress-side-band})
the encoder will behave as if it has received an unqualified instruction (output \textit{te\_inst}
reporting the address of the last instruction, followed by \textit{te\_support});


\section{Full vs Differential Addresses} \label{addresses}
Addresses can be output in one of two ways: \textit{full} or \textit{differential}.

\begin{itemize}
  \item The \textit{full} address is the actual address of the current instruction;
  \item The \textit{differential} address is the difference between the actual address of 
    the current instruction and the actual address of the instruction reported in the 
    previous packet that contained an address.
\end{itemize}

Packet formats 1 and 2 include a differential address, whilst format 3 includes the full address.

\section{Format selection} \label{format-selection}

In all cases but one, the packet format (3) is determined only by a 'yes' outcome from the 
associated decision.  The choice between formats 1 or 2 for the case in the middle of the 
diagram needs further explanation.  

If there are no branches that need to be reported, packet format 2 is used.  

If there are branches to report, format 1 is used.

If there is no address to report, then there are two sub-formats of format 1.  If branch prediction
is supported and is enabled, then there is a choice of whether to output a full branch map, or a 
count of correctly predicted branches.  In order to chose the count, the number of correctly
predicted branches must be at least 31.  If there are 31 unreported branches (i.e. the branch
map is full), but not all of them were predicted correctly, then the branch map will be output.
If all 31 unreported branches were correctly predicted, then the encoder starts counting
subsequent correct predictions, and will output a count under the following conditions:

\begin{itemize}
  \item A branch is mis-predicted.  The count value will be the number of correctly predicted branches, 
    minus 31.  \textbf{no\_mispred} will be 0, indicating that the next branch failed its prediction;
  \item An updiscon, interrupt or exception requires the encoder to output an address.  In this case 
    the encoder will output the branch count (number of correctly predicted branches, minus 31) with 
    \textbf{no\_mispred} set to 1, followed by a format 2 packet reporting the address 
    (not yet shown in Figure~\ref{fig:algo}). 
    \textbf{DISCUSSION POINT:} This is the only scenario so far where the encoder is required to 
    output 2 packets as a result of a single instruction.  One way to avoid this would
    be to use format 0 vs 1 to distinguish between branch map and branch count (eliminating the need for
    the \textbf{branch\_fmt} bit).  However, this uses up the currently free format.  The other far less
    attractive alternative is to add a \textbf{branch\_fmt} bit to all format 1 packets, which has the 
    major disadvantage of impacting the efficiency of all format 1 packets;
  \item The branch count reaches its maximum value (0xffff).  \textbf{no\_mispred} will be set to 1 to
    indicate that the outcome of the next branch cannot be inferred (it will be explicitly recorded and
    output later).   
\end{itemize}

Packet formats 1 and 2 are organized so that the address is the final field.  Minimizing the 
number of bits required to represent the address reduces the total packet size and significantly
improves efficiency.  See Chapter~\ref{packets}.