This is the Silice main documentation.
When designing with Silice your code describes circuits. If not done already, it is highly recommended to watch the introductory video (youtube) to get more familiar with this notion and what it entails.
- A first example
- Terminology
- Basic language constructs
- Types
- Constants
- Variables
- Tables
- Block RAMs / ROMs
- Operators
- Swizzling
- Arithmetic, comparison, bit-wise, reduction, shift
- Concatenation
- Bindings
- Always assign
- Bound expressions
- Groups
- Interfaces
- Bitfields
- Intrinsics
- Algorithms
- Execution flow and cycle utilization rules
- The step operator
- Control flow
- Cycle costs of calls to algorithms and subroutines
- Pipelining
- Lua preprocessor
- Interoperability with Verilog modules
- Host hardware frameworks
GitHub repository: https://github.com/sylefeb/Silice/
This first example assumes two signals: a button
input (high when
pressed) and a led
output, each one bit.
The main algorithm – the one expected as top level – simply asks the led to turn on when the button is pressed. We will consider several versions – all working – to demonstrate basic features of Silice.
Perhaps the most natural version for a programmer not used to FPGAs would be:
algorithm main(input uint1 button,output uint1 led) {
while (1) {
led = button;
}
}
This sets led
to the value of button
every clock cycle. However, we
could instead specify a always assignment between led and button:
algorithm main(input uint1 button,output uint1 led) {
led := button;
}
This makes led
constantly track the value of button
. A very
convenient feature is that the always assignment can be overridden
at some clock steps, for instance:
algorithm main(input uint1 button,output uint1 led) {
led := 0;
while (1) {
if (button == 1) {
led = 1;
}
}
}
Of course this last version is needlessly complicated (the previous one
was minimal), but it shows the principle: led
is always assigned
0, but this will be overridden whenever the button is pressed. This is
useful to maintain an output to a value that must change only on some
specific event (e.g. producing a pulse).
This first example has an interesting subtlety. The button
input, if it comes directly from an onboard button, is likely asynchronous. This means it can change at anytime, even during a clock cycle. That could lead to trouble with the logic that depends on it. To be safe, a better approach is to register the input button, this can be achieved as follows:
algorithm main(input uint1 button,output uint1 led) {
led ::= button;
}
Using the ::=
syntax introduces a one cycle latency and inserts a flip-flop between the asynchronous input and whatever comes after. This won't change our results in this toy example but is generally important.
Some terminology we use next:
- VIO: a Variable, Input or Output.
- One-cycle block: A block of operations that require a single cycle (no loops, breaks, etc.).
- Host hardware framework: The Verilog glue to the hardware meant to run the design. This can also be a glue to Icarus[1] or Verilator[2], both frameworks are provided.
- Combinational loop: A circuit where a cyclic dependency exists. These lead to unstable hardware synthesis and have to be avoided.
Silice supports signed and unsigned integers with a specified bit width:
-
int
N with N the bit-width, e.g.int8
: signed integer. -
uint
N with N the bit-width, e.g.uint8
: unsigned integer.
Constants may be given directly as decimal based numbers (eg. 1234
),
or can be given with a specified bit width and base:
-
3b0101
, 3 bits wide value 5. -
32hffff
, 32 bits wide value 65535. -
4d10
, 4 bits wide value 10.
Supported base identifiers are: b
for binary, h
for hexadecimal, d
for decimal. If the value does not fit the bit width, it is clamped.
Variables are declared with the following pattern:
-
TYPE ID = VALUE;
initializes the variable with value on algorithm start / reset. -
TYPE ID(VALUE);
initializes the variable with value on power-up (when the FPGA is configured).
where TYPE
is a type definition
(Section 3.1),
ID
a variable identifier (starting with a letter followed by
alphanumeric or underscores) and VALUE
a constant
(Section 3.2).
The initializer is mandatory, and is always a simple constant (no
expressions), or the special value uninitialized
. The later indicates
that the initialization can be skipped, reducing design size. This is
particularly interesting on brams/broms and register arrays.
intN tbl[M] = {...}
Example: int6 tbl[4] = {0,0,0,0};
Table sizes have to be constant at compile time. The initializer is mandatory and can be a string, in which case each letter becomes its ASCII value, and the string is null terminated.
The table size M
is optional (eg. int4 tbl[]={1,2,3};
) in which
case the size is automatically derived from the initializer (strings
have one additional implicit character: the null terminator).
If the table size is specified and the initializer is a shorter string, the table is automatically padded with zeros. A longer string results in an error.
If the table size is specified and the initializer – not a string – has
a different number of elements, an error occurs. However, if a smaller
number of elements is given the last element can be pad(value)
in
which case the remainder of the table is filled with value
.
Example: int6 tbl[256] = {0,0,0,0,pad(255)};
Example: int6 tbl[256] = {pad(0)};
The keyword uninitialized
may be used to explicitly skip table
initialization, in which case the initial table state is unknown:
int6 tbl[4] = uninitialized;
.
Similarly, the padding can be uninitialized:
int6 tbl[256] = {0,0,0,0,pad(uninitialized)};
. In this case, only the
first part of the table will be initialized, the state of the other
values will be unknown.
Tables can also be initialized from file content, but only for specific bit-widths (currently 8 bits and 32 bits).
Example: int32 tbl[256] = {file("data.img"), pad(0)};
This loads as many as possible 32 bits values from file data.img
and pads the rest with zeros.
bram intN tbl[M] = {...}
Block RAMs are declared in a way similar to tables. Block RAMs map to special FPGA blocks and avoid using FPGA LUTs to store data. However, accessing a block RAM typically requires a one-cycle latency.
Important: Block RAMs are only initialized at power-up (when the FPGA is configured).
A block RAM variable has four members accessible with the ’dot’ syntax:
addr
the address being accessed,wenable
set to 1 if writing, set to 0 if reading,rdata
result of read,wdata
data to be written.
Here is an example of using a block RAM in Silice:
bram int8 table[4] = {42,43,44,45};
int8 a = 0;
table.wenable = 0; // read
table.addr = 2; // third entry
++: // wait on clock
a = table.rdata; // now a == 44
Finally, an advanced option lets you indicate that the BRAM inputs need not be latched:
bram int8 table<input!>[4] = uninitialized;
This is used to minimize a design size ; however the caller has to ensure values are all properly present at the right cycle when reading/writing.
The rules for initializers of BRAMs are the same as for tables.
ROMs use a similar syntax, using brom
instead of bram
.
A dual-port block RAM has eight members accessible with the ’dot’ syntax:
addr0
the address being accessed on port 0,wenable0
write enable on port 0,rdata0
result of read on port 0,wdata0
data to be written on port 0,addr1
the address being accessed on port 1wenable1
write enable on port 1,rdata1
result of read on port 1,wdata1
data to be written on port1.
The dual-port BRAM also has optional clock parameters for port0 and port1.
Here is an example of using a block RAM in Silice with different clocks:
dualport_bram int8 table<@clock0,@clock1>[4] = {42,43,44,45};
int6 a = 0;
int1 b = a[1,1]; // second bit
int2 c = a[1,2]; // int2 with second and third bits
int3 d = a[2,3]; // int3 with third, fourth and fifth bits
The first entry may be an expression, the second has to be a constant.
All standard Verilog operators are supported, binary and unary.
Concatenation allows to combine expressions to form expressions having larger bit-width. The syntax is to form a comma separated list enclosed by braces. Example:
int6 i = 6b111000;
int2 j = 2b10;
int10 k = 0;
k := {j,i,2b11};
Here c is obtained by concatenating a,b and the constant 2b11 to form a ten bits wide integer.
Bits can also be replicated with a syntax similar to Verilog:
uint2 a = 2b10;
uint9 c = 0;
c := { 1b0, {8{a[1,1]}} };
Here the last eight bits of variable c are set to the second bit of a.
Silice defines operators for bindings the input/output of algorithms and modules:
<:
binds right to left,:>
binds left to right,<:>
binds both ways.
The bound sides have to be VIO identifiers. To bind expressions you can use expression tracking (see Section 3.6.6).
Bound VIOs are connected and immediately track each others values. A good way to think of this is as a physical wire between IC pins, where each VIO is a pin. Bindings are specified when instantiating algorithms and modules.
The bidirectional binding is reserved for two use cases:
- binding
inout
variables, - binding groups (see Section 3.7) and interfaces (see Section 3.8).
There are two other versions of the binding operators:
-
<::
binds right to left, using value at latest clock rising edge, -
<::>
binds an IO group, using value at latest clock rising edge for inputs.
Normally, the bound inputs are tracking the value of the bound VIO as changed during the current clock cycle. Therefore, the tracking algorithm/module immediately gets new values (in other words, there is a direct connection). This, however, produces deeper circuits and can reduce the max frequency of a design. These operators allow to bind the value as it was at the cycle start (positive clock edge). This introduces a one cycle latency before the algorithm/module sees the change, but makes the resulting circuit less deep. See also the notes on algorithms calls, bindings and timings.
Note: when a VIO is bound both to an algorithm input and and algorithm output (making a direct connection between two instantiated algorithms), then using
<::
or<:
will result in the same behavior, which is controlled by the use ofoutput
oroutput!
on the algorithm driving the output. This will be clarified in the future, see issue #49.
Silice defines operators for always assignment of VIOs. An always assignment is performed regardless of the execution state. It happens before anything else and can be overridden from the algorithm. Always assignments are order dependent between them. The left side of the assignment has to be a VIO identifier, while the right side may be an expression.
-
:=
assign right to left at each rising clock, -
::=
assign right to left with a one clock cycle delay (two stages flip-flop for e.g. clock domain crossing).
Always assignments are specified just after variable declarations and algorithms/modules instantiations. A typical use case is to make something pulse high, for instance:
algorithm writer(
output uint1 write, // pulse high to write
output uint8 data, // byte to write
// ...
) {
write := 0; // maintain low with always assign
// ...
if (do_write) {
data = ...;
write = 1; // pulses high on next rising clock
}
}
Note: the assignment is performed before anything else. If the value of the expression in the right hand side is later changing (during the clock cycle), this will not change the value of the left hand side. To achieve this use bound expressions (see next).
Note: If the right hand side of a
:=
contains an asynchronous input, the always assignement will not register it. The always assignment tracks the input value immediately as it changes. To register an input use::=
instead.
Variables can be defined to constantly track the value of an expression, hence binding the variable and the expression. This is done during the variable declaration, using either the <: or <:: binding operators. Example:
algorithm adder(
output uint9 o,
// ...
) {
uint8 a = 1;
uint8 b = 2;
uint9 a_plus_b <: a + b;
a = 15;
b = 3;
o = a_plus_b; // line 11
}
In this case o gets assigned 15+3 on line 11, as it tracks immediate changes to the expression a+b. Note that the variable a_plus_b becomes read only (in Verilog terms, this is now a wire). We call o an expression tracker.
The second operator, <:: tracks the expression using the values of the variables as they where at the previous clock rising edge. If used in this example, o would be assigned 1+2.
Bound expression can refer to other bound expressions, however <: and <:: cannot be mixed. (They could be in technical terms, but this was found to quickly lead to confusion).
Often, we want to pass around variables that form conceptual groups, like the interface to a controller or a 2D point. Silice has a specific mechanism to help with that. A group is declared as:
// SDRAM interface
group sdram_32b_io
{
uint24 addr = 0,
uint1 rw = 0,
uint32 data_in = 0,
uint32 data_out = 0,
uint1 busy = 1,
uint1 in_valid = 0,
uint1 out_valid = 0
}
Note that group declarations are made outside of algorithms.
A group variable can be declared directly:
sdram_32b_io sd; // init values are defined in the group declaration
To pass a group variable to an algorithm, we need to define an interface (see also Section 3.8). This will further specify which group members are input and outputs:
algorithm sdramctrl(
// ..
// anonymous interface
sdram_provider sd {
input addr, // address to read/write
input rw, // 1 = write, 0 = read
input data_in, // data from a read
output data_out, // data for a write
output busy, // controller is busy when high
input in_valid, // pulse high to initiate a read/write
output out_valid // pulses high when data from read is
}
) {
// ..
Binding to the algorithm is then a single line ; in the parent:
sdramctrl memory(
sd <:> sd,
// ..
);
The <::>
operator can also be used in the binding of an interface (see Section 3.6.4).
A group can be passed in a call if the algorithm describes an anonymous interface. If the interface has both inputs and outputs, the group can appear both as an input and an output in the call.
Note: A group cannot contain tables nor other groups.
Interfaces can be declared for a group during algorithm definition.
group point {
int16 x = 0,
int16 y = 0
}
algorithm foo(
point p0 {
input x,
input y,
},
point p1 {
input x,
input y,
}
) {
// ..
To simplify the above, and when group members are all bound either as inputs or outputs, as simpler syntax exists:
algorithm foo(
input point p0,
input point p1
) {
// ..
Anonymous interfaces allow groups to be given as parameters during calls.
Named interfaces are ways to describe what inputs and outputs an algorithm expects, without knowing in advance the exact specification of these fields (e.g. their widths). Besides making algorithm IO description more compact, this provides genericity.
Named interfaces have to be bound to a group (Section 3.7) or an interface upon algorithm instantiation. They cannot be used to pass a group in a call. The binding does not have to be complete: it is possible to bind to an interface a group having more (not less) members than the interface.
Reusing the example of Section 3.7 we can define a named interface ahead of time:
interface sdram_provider {
input addr, // address to read/write
input rw, // 1 = write, 0 = read
input data_in, // data from a read
output data_out, // data for a write
output busy, // controller is busy when high
input in_valid, // pulse high to initiate a read/write
output out_valid // pulses high when data from read is
}
And then the sdram controller algorithm declaration simply becomes:
algorithm sdramctrl(
// interface
sdram_provider sd,
// ..
) {
// ..
Note than the algorithm is now using a named interface, it does not know in advance the width of the signals in the interface. This is determined at binding time, when a group is bound to the interface. Interfaces can be bound to other interfaces, the information is propagated when a group is bound at the top level.
For instance, we may define another group as follows, which uses 8 bits instead of 32 bits:
// SDRAM interface
group sdram_8b_io
{
uint24 addr = 0,
uint1 rw = 0,
uint8 data_in = 0,
uint8 data_out = 0,
uint1 busy = 1,
uint1 in_valid = 0,
uint1 out_valid = 0
}
This group can be bound to the sdramctrl
controller, which now will
receive 8 bits signals for data_in
/data_out
.
The sdramctrl
controller can adjust to this change, using sameas
and
widthof
. The first, sameas
, allows to declare a variable the same as
another:
sameas(sd.data_in) tmp;
The second, widthof
, returns the width of a signal:
while (i < (widthof(sd.data_in) >> 3)) {
// ...
}
Note: In the future Silice will provide compile time checks, for instance verifying the width of a signal is a specific value.
There are many cases where we pack multiple data in a larger variable, for instance:
uint32 data = 32hffff1234;
// ..
left = data[ 0,16];
right = data[16,16];
The hard coded values make this type of code quite hard to read and difficult to maintain. To cope with that, Silice defines bitfields. A bitfield is first declared:
bitfield Node {
uint16 right,
uint16 left
}
Note that the first entry in the list is mapped to the higher bits of the field.
This can then be used during access, to unpack the data:
left = Node(data).left;
right = Node(data).right;
There is no overhead to the mechanism, and different bitfield can be used on a same variable depending on the context (e.g. instruction decoding).
The bitfield can also be used to initialize the wider variable:
uint32 data = Node(left=16hffff,right=16h1234);
Silice has convenient intrinsics:
-
__signed(exp)
indicates the expression is signed (Verilog$signed
) -
__unsigned(exp)
indicates the expression is unsigned (Verilog$unsigned
) -
__display(format_string,value0,value1,...)
maps to Verilog$display
allowing text output during simulation.
Algorithms are the main elements of a Silice design. An algorithm is a specification of a circuit implementation. Thus, like modules in Verilog, algorithms have to be instanced before being used. Each instance becomes an actual physical layout on the final design. An algorithm can instance other algorithms and Verilog modules.
Instanced algorithms always run in parallel. However they can be called synchronously (implying a wait state in the caller). They may run forever and start automatically. Each may be driven from a specific clock and reset signal.
The top level algorithm is called main and has to be defined. It is automatically instanced by the host hardware framework, see Section 8.
An algorithm is declared as follows:
algorithm ID (
input TYPE ID
...
output TYPE ID
...
output! TYPE ID
...
) <MODS> {
DECLARATIONS
SUBROUTINES
ALWAYS_ASSIGNMENTS
ALWAYS_BLOCK
INSTRUCTIONS
}
Most elements are optional: the number of inputs and outputs can vary,
the modifiers may be empty (in which case the ’<>’
is not necessary)
and declarations, bindings and instructions may all be empty.
Here is a simple example:
algorithm adder(intput uint8 a,intput uint8 b,output uint8 v)
{
v = a + b;
}
Let us now discuss each element of the declaration.
Inputs and outputs may be declared in any order, however the order
matters when calling the algorithms (parameters are given in the order
of inputs, results are read back in the order of outputs). Input and
outputs can be single variables or tables. A third type inout
exists
for compatibility with Verilog modules, however these can only be passed
and bound to imported modules (i.e. they cannot be used in expressions
and instructions for now).
Note that there are two types of output: output
and output!
. These
distinguish between a standard and an immediate output (exclamation
mark). A standard output sees its value updated at the next clock cycle.
An immediate output ensures that callers sees the results within the
same cycle. This can be for instance important when implementing a video
driver; e.g. a HDMI module generates a pixel coordinate on its output
and expects to see the corresponding color on its input within the same
clock cycle (even though a better design would allow a one-cycle
latency). In such a scenario the algorithm outputting the color to the
HDMI driver will input the coordinate and use output!
for the color.
However, using only output!
will in some cases result in combinational
loops: a cycle in the described circuitry (that is generally bad).
Silice detects such cases and will issue an error (see also
Section 4.6).
Note: Silice combinational loop detection is not perfect yet. Such a problem would typically result in simulation hanging (unable to stabilize the circuit) with Verilator issuing a
UNOPTFLAT
warning.
Variables, instanced algorithms and instanced modules have to be declared first (in any order). A simple example:
algorithm main(output uint8 led)
{
uint8 r = 0;
adder ad1;
// ... btw this is a comment
}
Algorithms and modules can be instanced from within a parent algorithm. Optionally parameters can be bound to the parent algorithm variables. In terms of hardware, these are the wires connecting instanced and parent algorithms. Instantiation uses the following syntax:
MOD_ALG ID<@CLOCK,!RESET> (
BINDINGS
(<:auto:>)
);
where MOD_ALG
is the name of the module/algorithm, ID
an identifier
for the instance, and BINDINGS
a comma separated list of bindings
between the instance inputs/outputs and the parent algorithm variables.
The bindings are optional and may be partial. @CLOCK
optionally
specifies a clock signal, !RESET
a reset signal, where CLOCK
and
RESET
have to be uint1
variables in the parent algorithm. Both are
optional, and if none is specified the brackets <>
can be skipped.
When none are specified the parent clock
and reset
are used for the
instance.
Each binding is defined as: ID_left OP ID_right
where ID_left
is the
identifier of an instance input or output, OP
a binding operator
(Section 3.6.4)
and ID_right
a variable identifier.
Note that only identifiers are allowed in bindings: access to tables and swizzling are not allowed. This can be alleviated using bound expressions, see Section 3.6.6.
Combined with autorun
such bindings allow to instantiate and
immediately run an algorithm to drive some of the parent algorithm
variables. Here is an example:
algorithm blink(output int1 b_fast,output int1 b_slow) <autorun>
{
uint20 counter = 0;
while (1) {
counter = counter + 1;
if (counter[0,8] == 0) {
b_fast = !b_fast;
}
if (counter == 0) {
b_slow = !b_slow;
}
}
}
algorithm main(output int8 led)
{
int1 a = 0;
int1 b = 0;
blink b0(
b_slow :> a,
b_fast :> b
);
led := 0; // turn off all eight LEDs
led[0,1] := a; // first LED always assigned first bit (slow blink)
led[1,1] := b; // second LED always assigned second bit (faster blink)
while (1) { } // inifinite loop, blink runs in parallel
}
This example reveals some interesting possibilities, and a constraint.
As algorithm blink
is instanced as b0
, it starts running immediately
(due to the autorun
modifier in the declaration of blink
). Hence,
the variables a
and b
are immediately reflecting the b_slow
and
b_fast
outputs of b0
. The always assignment to led
then use a
and b
to output to the led
8-bit variable, which we assume is
physically connected to LEDs. The first assignment sets led
to zero,
and then its two first bits to a
and b
. As the always assignments
are order dependent, this behaves properly with the two first bits
always assigned at each clock cycle. Since all updates in main
are
done through bindings and always assignments, and because algorithms
run in parallel, there is nothing else to do, and main
enters an
infinite loop (runs as long as there is power).
The constraint, however, is that a
and b
are necessary. This is due
to the fact that we cannot directly bind led[0,1]
and led[1,1]
to
the instance of blink
. Only identifiers can be specified during
bindings. This can be alleviated by using bound expressions (see
Section 3.6.6).
The optional <:auto:>
tag allows to automatically bind matching
identifiers: the compiler finds all valid left/right identifier pairs
having the same name and binds them directly. While this is convenient
when many bindings are repeated and passed around, it is recommended to
use groups for brevity and make all bindings explicit.
Note automatic binding can be convenient, but is discouraged as it relies a name matching and changes can introduce silent errors. When a large number of inputs/outputs have to be bound, consider using groups and interfaces.
The inputs and outputs of instanced algorithms, when not bound, can be directly accessed using a ’dot’ notation. Outputs can be read and inputs written to. Example:
algorithm algo(input uint8 i,output uint o)
{
o = i;
}
algorithm main(output uint8 led)
{
algo a;
a.i = 131;
() <- a <- (); // this is a call without explicit inputs/outputs
led = a.o;
}
Algorithms may be called synchronously or asynchronously, with or without parameters. Only instanced algorithms may be called.
See also the detailed notes on algorithms calls, bindings and timings.
The asynchronous call syntax is as follows:
ID <- (P_1,...,P_N); // with parameters
ID <- (); // without parameters
Where ID
is the instanced algorithm name. When the call is made with
parameters, all have to be specified. The parameters are given in the
same order they are declared in the algorithm being called. The call
without parameters is typically used when inputs are bound or specified
directly. Note that when parameters are bound, only the call without
parameters is possible.
When an asynchronous call is made, the instanced algorithm will start processing at the next clock cycle. The algorithm runs in parallel to the caller and any other instanced algorithm. To wait and obtain the result, the join syntax is used:
(V_1,...,V_N) <- ID; // with receiving variables
() <- ID; // without receiving variables
Note: Calling algorithms across clock domains is not yet supported. For such cases, autorun with bindings is recommended.
Algorithms can contain subroutines. These are local routines that can be called by the algorithm multiple times. A subroutine takes parameters, and has access to the variables, instanced algorithms and instanced modules of the parent algorithm – however access permissions have to be explicitly given. Subroutines offer a simple mechanism to allow for the equivalent of local functions, without having to wire all the parent algorithm context into another module/algorithm. Subroutines can also declare local variables, visible only to their scope. Subroutines avoid duplicating a same functionality over and over, as their code is synthesized a single time in the design, regardless of the number of times they are called.
A subroutine is declared as:
subroutine ID(
input TYPE ID
...
output TYPE ID
...
reads ID
...
writes ID
...
readwrites ID
...
calls ID
) {
DECLARATIONS
INSTRUCTIONS
return;
}
(a final return is optional)
Here are simple examples:
algorithm main(output uint8 led)
{
uint8 a = 1;
subroutine shift_led(readwrites a) {
a = a << 1;
if (a == 0) {
a = 1;
}
}
subroutine wait() {
uint20 counter = 0;
while (counter != 0) {
counter = counter + 1;
}
}
led := a;
while(1) {
() <- wait <- ();
() <- shift_led <- ();
}
}
Subroutines permissions. Subroutine permissions ensure only those
variables given read/write permission can be manipulated. This mitigates
the fact that a subroutine may directly manipulate variables in the
parent algorithm. The format for the permissions is a comma separated
list using keywords reads
, writes
, readwrites
Why subroutines? There is a fundamental difference between a
subroutine and an algorithm called from a host: the subroutine never
executes in parallel, while a child algorithm could. However, if this
parallelism is not necessary, subroutines offer a simple mechanism to
repeat internal tasks.
Global subroutines Subroutines may be declared outside of an algorithm. Such subroutines are called global and may be called from any algorithm. This is convenient to share subroutines across algorithms. If a global subroutine requests access to parent algorithm variables (read/write permissions), an algorithm calling the subroutine has to have a matching set of variables in its scope. Note that global subroutines are not technically shared, but rather copied in each calling algorithm’s scope at compile time.
Nested calls
Subroutines can call other subroutines, and have to declare this is the
case using the keyword calls
.
subroutine wait(input uint24 delay)
{
// ...
}
subroutine init_device(calls wait)
{
// ...
() <- wait <- (1023);
}
Note that re-entrant calls (subroutine calling itself, even through other subroutines) are not possible.
Sometimes, it is useful to write a generic piece of code that can be instantiated within a design. Such an example is a piece of circuitry to write into an SDRAM, which bit width may not ne known in advance.
A circuitry offers exactly this mechanism in Silice. It is declared as:
circuitry ID(
input ID
output ID
inout ID
) {
INSTRUCTIONS
}
Note that there is no type specification on inputs/outputs as these are resolved during instantiation. Here is an example of circuitry:
circuitry writeData(inout sd,input addr,input data) {
// wait for sdram to not be busy
while (sd.busy) { /*waiting*/ }
// write!
sd.data_in = rdata;
sd.addr = addr;
sd.in_valid = 1;
}
Note the use of inout for sd (which is a group, see Section 3.7). A circuitry is not called, it is instantiated. This means that every instantiation is indeed a duplication of the circuitry.
Here is the syntax for instantiation:
(output_0,...,output_N) = ID(input_0,...input_N)
As for algorithms and subroutines, inputs and outputs have to appear in the order of declaration in the lists. An inout appears twice, both as output and input. Following the previous example here is the instantiation from an algorithm:
(sd) = writeData(sd,myaddr,abyte);
Note: currently circuitry instantiation can only be made on VIO identifiers (no expressions, no bit-select or part-select). This restriction will be removed at some point. In the meantime bound expressions provide a work around, first defining a bound expression with an identifier then giving it to the circuitry (these can be defined in a block around the circuit instantiation).
When writing code, you may encounter a case where the Silice compiler raises an error due to an expression leading to a combinational loop. This indicates that the sequence of operations is leading to a cyclic dependency in the circuit being described. These are in most cases undesirable as they lead to unpredictable hardware behaviors.
A trivial solution is such a situation is to split the circuit loop using the step operator (see Section 5.1). However, this will change you sequence of operations and the designer may choose a different approach. Therefore Silice reports and error and invites to either manually add a step, or to revise the code. In many cases a slight rewrite avoids the issue entirely.
Example of a combinational loop:
algorithm main()
{
uint8 a = 0;
uint8 b = 0;
// ...
a = b + 1;
a = a + 1; // triggers a combinational loop error
}
So what happens? It might seem that a = a + 1
is the problem here, as
it writes as a cyclic dependency. In fact, that is not the sole cause.
On its own, this expression is perfectly fine: for each variable Silice
tracks to versions, the value at the cycle start and the value being modified (corresponding to a hardware flip-flip DQ pair). So a = a + 1
in fact means
acurrent = aprevious + 1. In fact, the code describes the circuit to update aprevious from acurrent.
The problem here comes from the prior assignment to a, a = b + 1
.
This already sets a new value for acurrent, and thus the next expression
a = a + 1
would have to use this new value. This now leads to
acurrent = acurrent + 1
which this time is a combinational loop: a circuit that forms a closed loop.
A possible solution is to insert a cycle in between:
a = b + 1;
++: // wait one cycle
a = a + 1; // now this is fine
Another is to rewrite as (!):
a = b + 2;
Note: Silice does not attempt to fix loops on its own, even in such simple cases. This may change in the future. However Silice has a gold rule to never introduce cycles or flip-flops that could not be predicted by the designer.
It would be difficult to manually keep track of all potential chains, especially as they can occur through bindings, which is why Silice does it for you! In practice, such loops are rarely encountered, and in most cases easily resolved with slight changes to arithmetic and logic.
Note: Silice combinational loop detection is not perfect yet. Such a problem would typically result in simulation hanging (unable to stabilize the circuit) with Verilator issuing a
UNOPTFLAT
warning.
After declarations, optional always assignments can be specified. These use the operators defined in Section 3.6.5. Always assignments allow to follow the value of an expression in a variable, output, or instanced algorithm input or output. Contrary to bound expressions (Section 3.6.6), the assigned values can be changed during a cycle.
Example:
algorithm main(output uint8 led)
{
uint8 r = 0;
adder ad1;
led := r;
ad1.a := 1;
}
These assignments are always performed, before anything else in the algorithm. They are order dependent. For instance:
b := a;
c := b;
is not the same as:
c := b;
b := a;
In the first case, c will immediately take the value of a, while in the second case c will take the value of a after one clock cycle. This is useful, for instance, when crossing a clock domain as it allows to implement a two stages flip-flop. In fact, Silice provides a shortcut for exactly this purpose, making the temporary variable b unnecessary:
c ::= a;
Algorithms can have two always blocks: always_before
(or always
) and
always_after
. An always block has to be single cycle (no goto, while
or step operator inside). It is always running, regardless of the state
of the rest of the algorithm. The always_before
block is executed
immediately after always assignments, and before anything else. The
always_after
block is executed after everything else.
The syntax simply is:
always_before {
// instructions
// ...
}
Assigning a variable in an always before block or using an always assignment is in fact equivalent, that is:
always_before {
a = 0;
}
is functionally the same as
a := 0;
Note: variables changed in always_after
blocks have to use power-up
initialization only.
All algorithms receive a clock
and reset
signal from their parent
algorithm. These are intrinsic variables, are always defined within the
scope of an algorithm and have type int1
. The clock and reset can be
explicitly specified when an algorithm is instanced
(Section 4.2).
Upon declaration, modifiers can be specified (see <MODS>
) in the
declaration). This is a comma separated list of any of the following:
-
Autorun. Adding the
autorun
keyword will ask the compiler to run the algorithm upon instantiation, without waiting for an explicit call. -
Internal clock. Adding a
@ID
specifies the use of an internally generated clock signal. It is then expected that the algorithm contains aint1 ID = 0;
variable declaration, which is bound to the output of a module producing a clock signal. This is meant to be used together with Verilog PLLs, to produce new clock signals. The module producing the new clock will typically takeclock
as input andID
as output. -
Internal reset. Adding a
!ID
specifies the use of an internally generated reset signal. It is then expected that the algorithm contains aint1 ID = 0;
variable declaration, which is bound to the output of a module producing a reset signal. This may be used, for instance, to filter the signal from a physical reset button. -
One-hot. Adding the
onehot
modifier will use a ’onehot’ state numbering for the algorithm state machine. On small algorithms with few states this can result in smaller, faster designs.
Here is an example:
algorithm main(output int1 b) <autorun,@new_clock>
{
int1 new_clock = 0;
my_pll pll(
base_clock <: clock,
gen_clock :> new_clock
);
// the main algorithm is sequenced by new_clock
// ...
}
Which signals are allowed as clocks and resets depends on the FPGA architecture and vendor toolchain.
Upon compilation, Silice breaks down the code into a finite state machine (FSM). Each state corresponds to a circuit that updates the variables within a single cycle, and selects the next state. We next call these circuits combinational chains.
Silice attempts to form the longest combinational chains, or equivalently
to minimize the number of states in the FSM. That is because going from one
state to the next requires one clock cycle, delaying further computations.
Of course, longer combinational chains also lead to reduced clock frequency,
so there is a tradeoff. This is why Silice lets you explicitly specify
where to cut a combinational chain using the step operator ++:
Note that if a state contains several independent combinational chain, they will execute as parallel circuits (e.g. two computations regarding different sets of variables). Grouping such computations in a same state increases parallelism, without deepening the combinational chains.
The use of control flow structures such as while
(or goto
), as well
as synchronization with other algorithm also require introducing states.
This results in additional cycles being introduced. Silice follows a set
of precise rules to do this, so you always know what to expect.
Placing a ++:
in the sequence of operations of an algorithm explicitly
asks Silice to wait for exactly one cycle at this precise location in
the execution flow. Each next ++:
waits one additional cycle.
This has several important applications such as waiting for a memory read/write, or breaking down long combinational chains that would violate timing.
The most basic control flow operation is goto
. While it may be used
directly, it is recommended to prefer higher level primitives such as
while
and subroutines, because goto
often leads to harder to read
and maintain code[3]. Yet, they are a fundamental low-level operation
and Silice is happy to expose them for your enjoyment!
goto
always requires one cycle to ’jump’: this is a change of state in
the FSM. Entering a while
takes one cycle and then, if the block inside is
a single combinational chain, it takes exactly one cycle per iteration.
Exiting a while
takes one cycle ; however when chaining
loops only one cycle is required to enter the next loop. So the first
loop takes one cycle to enter, any additional loop afterwards adds a single cycle
if there is nothing in between them, the last loop takes one cycle to exit.
Now, if-then-else
is slightly more subtle. When applied to sequences
of operations not requiring any control flow, an if-then-else
synthesizes to a combinational if-then-else
. This means that both the
’if’ and ’else’ code are evaluated in parallel as combinational chains,
and a multiplexer selects which one is the result based on the
condition. This applies recursively, so combinational if-then-else
may
be nested.
When the if-then-else
contains additional control flow (e.g. a
subroutine call, a while
, a ++:
, a goto
, etc.) it is automatically
split into several states. It then takes one cycle to exit the ’if’ or
’else’ part and resume operations. If only the ’if’ or the ’else’
requires additional states while the other is a one-cycle block (possibly empty),
an additional state is still required to ’jump’
to what comes after the if-then-else
. So in general this will cost one
cycle. However, in cases where this next state already exists, for
instance when the if-then-else
is at the end of a while
loop, this
existing state is reused resulting in no overhead.
This has interesting implications. Consider the following code:
while(...) {
if (condition) {
...
++:
...
}
a = b + 1; // line 7
}
When the if
is not taken, we still have to pay one cycle to reach line
7. That is because it has been placed into a separate state to jump to
it since the if
is not a one-cycle block (due to ++:
). Hence, the
while loop will always take at least two cycles.
However, you may choose to duplicate some code (and hence some circuitry!) to avoid the extra cycle, rewriting as:
while(...) {
if (condition) {
...
++:
...
a = b + 1;
} else {
a = b + 1;
}
}
This works because when the if
is taken the execution simply goes back
to the start of the while
, a state that already exists. The else
being a one-cycle block followed by the start of the while
no extra
cycle is necessary! We just tradeoff a bit of circuit size for extra
performance.
Silice also supports the switch-case
syntax, as follows:
switch( <EXPR> ) {
case <CONST>: { /* code for this case */ }
case <CONST>: { /* code for this case */ }
default: { /* code for default case */ }
}
where <EXPR>
is an expression and <CONST>
are constants.
There is also a onehot version:
switch( <IDENTFIER> ) {
case 0: { /* code for this case */ }
case 1: { /* code for this case */ }
...
case <W-1>: { /* code for this case */ }
default: { /* code for default case */ }
}
where <IDENTFIER>
is a variable of width W
and each case is activated for
the corresponding bit of <IDENTFIER>
being set to 1
, with all other bits set to 0
.
The default
is only mandatory if not all bits are tested, and otherwise
only necessary if <IDENTFIER>
may be zero (not having a default while <IDENTFIER>
may be zero leads to undefined behaviors).
A synchronized call to an algorithm takes at best two cycles: one cycle for the algorithm to start processing, and one cycle to register that the algorithm is done. An asynchronous call does not require additional cycles (the called algorithm will start running on the next clock cycle), while a synchronization inserts a state in the control flow and always requires at least one additional cycle, even if the algorithm is already finished.
A synchronized call to a subroutine takes at best two cycles: one cycle to jump to the subroutine initial state, and one cycle to jump back.
Note: Keep in mind that algorithms can also autorun and have their inputs/outputs bound to variables. Algorithms may also contain an
always_before
oralways_after
blocks that are always running.
TODO: describe syntax
A preprocessor allows to change the way the source code is seen by the compiler. It sits between the input source code and the compiler, and basically rewrites the original source code in a different way, before it is input to the compiler. This allows compile-time behaviors, such as adapting the code to a target platform.
Having a strong preprocessor is important for hardware design. For instance, designing a sort network of some size N is best done by generating the code automatically from a preprocessor, so that one can easily reuse the same code for different values of N. The same is true of a division algorithm for some bit width, or for the pre-computations of lookup tables.
The Silice preprocessor is built above the Lua language https://lua.org. Lua is an amazing powerful, lightweight scripting language that originated in the Computer Graphics community. The preprocessor is thus not just a macro system, but a fully fledged programming language.
Preprocessor code is directly interleaved with Silice code. At any point, the source code can be escaped to the preprocessor by starting a line with $$. The full line is then considered as preprocessor Lua code.
The pre-processor sees Silice source lines as strings to be output. Thus, it outputs to the compiler any line that is met during the execution of the preprocessor. This means it does not output those that are not reached, and it outputs multiple times those that appear in a loop.
Let’s make a quick example:
algorithm main()
{
$$if true then
uint8 a=0;
uint8 b=0;
$$end
$$if false then
uint8 c=0;
$$end
$$for i=0,1 do
b = a + 1;
$$end
}
The code output by the preprocessor is:
algorithm main()
{
uint8 a=0;
uint8 b=0;
b = a + 1;
b = a + 1;
}
Note that source code lines are either fully Silice, of fully
preprocessor (starting with $$). A second syntax allows to use
preprocessor variables directly in Silice source code, simply doing
varname
the variable from the preprocessor context.
In fact, a full Lua expression can be used in between the $ signs; this
expression is simply concatenated to the surrounding Silice code.
Here is an example:
algorithm main()
{
$$for i=0,3 do
uint8 a_$i$ = $100+i$;
$$end
}
The code output by the preprocessor is:
algorithm main()
{
uint8 a_0 = 100;
uint8 a_1 = 101;
uint8 a_2 = 102;
uint8 a_3 = 103;
}
If a large chunk of Lua code has to be written, you can simply write it
in a separate .lua
file and use:
$$dofile('some_lua_code.lua')
The preprocessor is also in charge of including other Silice source code
files, using the syntax $include(’source.ice’)
. This loads the entire
file (and recursively its own included files) into the preprocessor
execution context.
The pre-processor has a number of function to facilitate the inclusion
of images and palette data ; please refer to the example projects
vga_demo
, vga_wolfpga
and vga_doomchip
.
Silice can inter-operate with Verilog.
Verilog source code can be included in two different ways:
append(’code.v’)
and import(’code.v’)
where code.v
is a Verilog
source code file.
Append is very simple: the Verilog source code is directly appended to
the output of the compiler (which is a Verilog source code) without any
processing. This makes this Verilog code available for other Verilog
modules, in particular those that are imported (see next). The reason
for append
is that Silice is currently not able to fully parse Verilog
when importing.
Import is the most interesting way to inter-operate with Verilog. Once imported, all Verilog modules from the Verilog source file will be available for inclusion in algorithms.
The modules are instantiated in a very similar way as algorithms, with bindings to variables. However, modules cannot be called like algorithms, and the ’dot’ syntax to read/write outputs and inputs is not available for modules.
Due to the fact that Silice only understands a subset of Verilog, there are cases where a module cannot be imported directly. In such cases, wrapping offers a solution. The idea is to write a simpler Verilog module that can be imported by Silice and wraps the instantiation of the more complex one.
Then the complex module is included with append
and the wrapper with
import
.
The host hardware framework typically consists in a Verilog glue file. The framework is appended to the compiled Silice design (in Verilog). The framework instantiates the main algorithm. The resulting file can then be processed by the vendor or open source FPGA toolchain (often accompanied by a hardware constraint file).
Silice comes with the following host hardware frameworks:
-
mojo_basic : framework to compile for the Alchitry Mojo 3 FPGA board.
-
mojo_hdmi_sdram : framework to compile for the Alchitry Mojo 3 FPGA board equipped with the HDMI shield.
-
icestick : framework for the Lattice ice40 icestick board.
-
verilator_bare : framework for use with the verilator hardware simulator
-
verilator_sdram_vga : framework for use with the verilator hardware simulator, with VGA and SDRAM emulation
-
icarus_bare : framework for use with the icarus
-
icarus_vga : framework for use with the icarus hardware simulator, with vga emulation
For practical usage examples please refer to the Getting started guide at https://github.com/sylefeb/Silice/blob/master/GetStarted.md.
Silice comes with a tool called silicehe
for Silice hardware
emulation. This tool will read the output of the icarus_vga
simulator
(the fst file storing signals) and produce a sequence of images from the
stored VGA signals.
Simply call it with the fst file as the second argument.
[1] http://iverilog.icarus.com/
[2] https://www.veripool.org/wiki/verilator
[3] See the famous Dijkstra’s paper about this