forked from riscv-non-isa/riscv-trace-spec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexampleAlgorithm.tex
executable file
·125 lines (106 loc) · 7.88 KB
/
exampleAlgorithm.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
\chapter{Example Algorithm} \label{Algorithm}
An example algorithm for compressed branch trace is given in figure~\ref{fig:algo}.
In the diagram, the following terms are used:
\begin{itemize}
\item \textit{Qualified?} An instruction that meets the filtering criteria is qualified, and will be traced;
\item \textit{Branch?} Is the instruction a branch or not (\textbf{itype} values 4 or 5, or a non-zero \textbf{ntkn});
\item \textit{branch map.} A vector where each bit represents the outcome of a branch. A 0 indicates the
branch was taken, a 1 indicates that it was not;
\item \textit{inst.} Abbreviation for 'instruction';
\item \textit{resync count.} A counter used to keep track of when it is necessary to send
a synchronization packet (see Section~\ref{synchronization}, final bullet). The exact mechanism for
incrementing this counter are not specified, but options might be to count the number of \textit{te\_inst} packets emitted,
or the number of clock cycles elapsed since the last synchronization message was sent;
\item \textit{max\_resync.} The resync counter value that schedules a synchronization packet;
\item \textit{updiscon.} Uninferable PC disconinuity. This identifies an instruction that
causes the program counter to be changed by an amount that cannot be predicted from the
source code alone (\textbf{itype} values 8, 10, 12 or 14);
\item \textit{te\_inst.} The name of the packet type emitted by the encoder (see Chapter~\ref{packets});
\item \textit{e\_ccd.} An exception has been signalled, or context has changed and
should be treated as an uninferable PC discontinuity (see Table~\ref{tab:context-type});
\item \textit{ppch.} Privilege has changed, or context has changed and needs to be
reported precisely (see Table~\ref{tab:context-type});
\item \textit{ppch\_br.} As above, but branch map not empty;
\item \textit{resync\_br.} The resync counter has reached the maximum value and there are
entries in the branch map that have not yet been output. These must be output before
the subsequent synchronization packet, which does not report branch map history;
\item \textit{er\_ccdn.} Instruction retirement and exception signalled on the same cycle,
or context has changed and should be treated as an uninferable PC discontinuity, or
context notify (see Table~\ref{tab:context-type});
\item \textit{exc\_only.} Exception signaled without simultaneous retirement;
\item \textit{cci.} context change that can be reported imprecisely (see Table~\ref{tab:context-type}).
\end{itemize}
\begin{figure}[l]
\begin{center}
\includegraphics[height=23cm, width=15cm]{algo.png}
\caption{Delta Mode 1 instruction trace algorithm}
\label{fig:algo}
\end{center}
\end{figure}
Figure~\ref{fig:algo} shows instruction by instruction behavior, as would be
seen in a single-retirement system only. Whilst the ingress port allows the RISC-V core to
provide information on multiple retiring instructions simultaneously, the resultant packet
sequence generated by the encoder must be the same as if retiring one instruction at a time.
A 3-stage pipeline is assumed, such that the encoder has
visibility of the current, previous and next instructions. All packets are generated using
information relating to the current instruction. The orange diamonds indicate decisions
based on the previous (or last) instruction, the green diamond indicates a decision based on the
next instruction, and all other diamonds are based on the current instruction.
Additionally, the encoder can generate one further packet type, not shown on the diagram for
clarity. The \textit{support} packet (format 3, subformat 3 - see Chapter~\ref{packets}) is
sent when:
\begin{itemize}
\item The encoder is enabled or disabled, or its configuration is changed,
to inform the decoder of the operating mode of the encoder
\item After the last qualified instruction has been traced, to inform the decoderthat
tracing has stopped;
\item If trace packets are lost (for example if the buffer into which packets are being
written fills up. In this situation, the 1st packet
loaded into the buffer when space next becomes available should be a \textit{support}
packet. Following this, tracing will resume with a sync packet.
\end{itemize}
Note: if the \textbf{halted} or \textbf{reset} sideband signals are asserted (see Table~\ref{tab:ingress-side-band})
the encoder will behave as if it has received an unqualified instruction (output \textit{te\_inst}
reporting the address of the last instruction, followed by \textit{te\_support});
\section{Full vs Differential Addresses} \label{addresses}
Addresses can be output in one of two ways: \textit{full} or \textit{differential}.
\begin{itemize}
\item The \textit{full} address is the actual address of the current instruction;
\item The \textit{differential} address is the difference between the actual address of
the current instruction and the actual address of the instruction reported in the
previous packet that contained an address.
\end{itemize}
Packet formats 1 and 2 include a differential address, whilst format 3 includes the full address.
\section{Format selection} \label{format-selection}
In all cases but one, the packet format (3) is determined only by a 'yes' outcome from the
associated decision. The choice between formats 1 or 2 for the case in the middle of the
diagram needs further explanation.
If there are no branches that need to be reported, packet format 2 is used.
If there are branches to report, format 1 is used.
If there is no address to report, then there are two sub-formats of format 1. If branch prediction
is supported and is enabled, then there is a choice of whether to output a full branch map, or a
count of correctly predicted branches. In order to chose the count, the number of correctly
predicted branches must be at least 31. If there are 31 unreported branches (i.e. the branch
map is full), but not all of them were predicted correctly, then the branch map will be output.
If all 31 unreported branches were correctly predicted, then the encoder starts counting
subsequent correct predictions, and will output a count under the following conditions:
\begin{itemize}
\item A branch is mis-predicted. The count value will be the number of correctly predicted branches,
minus 31. \textbf{no\_mispred} will be 0, indicating that the next branch failed its prediction;
\item An updiscon, interrupt or exception requires the encoder to output an address. In this case
the encoder will output the branch count (number of correctly predicted branches, minus 31) with
\textbf{no\_mispred} set to 1, followed by a format 2 packet reporting the address
(not yet shown in Figure~\ref{fig:algo}).
\textbf{DISCUSSION POINT:} This is the only scenario so far where the encoder is required to
output 2 packets as a result of a single instruction. One way to avoid this would
be to use format 0 vs 1 to distinguish between branch map and branch count (eliminating the need for
the \textbf{branch\_fmt} bit). However, this uses up the currently free format. The other far less
attractive alternative is to add a \textbf{branch\_fmt} bit to all format 1 packets, which has the
major disadvantage of impacting the efficiency of all format 1 packets;
\item The branch count reaches its maximum value (0xffff). \textbf{no\_mispred} will be set to 1 to
indicate that the outcome of the next branch cannot be inferred (it will be explicitly recorded and
output later).
\end{itemize}
Packet formats 1 and 2 are organized so that the address is the final field. Minimizing the
number of bits required to represent the address reduces the total packet size and significantly
improves efficiency. See Chapter~\ref{packets}.