-
Notifications
You must be signed in to change notification settings - Fork 5
How Cleavir optimizes
Cleavir provides components for an optimizing compiler. While these components are individually documented (to varying extents), the Lisp implementor using Cleavir may find useful an overall view of how these systems interact to facilitate optimization.
(This is a work in progress. Most of this is only partially implemented so far, and/or is yet to be ported/generalized to Cleavir from where I originally wrote it in Clasp. Eventually I would like this to be the nucleus of a manual or chapter of a manual.)
Cleavir processes code in several phases. First, it receives the code as CSTs, which are essentially the original s-expressions annotated with source information. The CSTs are then converted into Cleavir's AST ("Abstract Syntax Tree") IR ("intermediate representation") by the CST-to-AST subsystem. Befitting the name, ASTs are more or less a straight translation of the original code, except that minimal compilation is done - i.e. macros and possibly compiler macros are expanded, load time values are figured out, etc. This is useful for some purposes (like an editor) but not very useful for optimization: control flow information is not clear, it's nontrivial to track def-use chains, etc. Cleavir instead does the bulk of its optimization in the next IR - BIR - which is compiled from the ASTs by the AST-to-BIR subsystem. The bulk of this document concerns BIR.
As will be described below, Cleavir tries to avoid adding source-level optimization hooks for the programmer. That is, it will try to optimize code like (* (the single-float x) (the single-float y))
, rather than relying on the frontend to give it (si:single-float-* x y)
. This has a few advantages. First, it makes it easier to optimize code that is semantically identical but not syntactically identical (e.g. (let ((f #'*)) (funcall f (the single-float x) (the single-float y)))
. Second, it means clients have more control over what kind of operators they want their frontend to expose, and they don't need to expose Cleavir's. Third, it means Cleavir's frontend (CST-to-AST) does not need to be concerned with the optimizations available to the middle end and backend, improving modularity.
As mentioned, CST-to-AST can expand compiler macros. Compiler macros are source-to-source transformations, so this seems like the appropriate place to do it. Unfortunately, considering they are the most flexible optimization hook into the implementation that the standard provides for programmers, compiler macros are not especially powerful. They are not capable of optimizing code like the funcall
above because they rely on checking an actual form's actual car
. They do not have access to information like types or knowing how a call is used, except possibly through an extension that provides environment access, but even then it would only be lexically apparent information and not anything derived via control flow analysis (as doing that analysis directly on source forms is needlessly complicated). And they have to worry about macro concerns like multiple evaluation even though they primarily concern functions.
Cleavir provides a more powerful but less standard mechanism to perform optimizations tailored to particular functions, called deftransform
, described below.
Cleavir can be provided with a function's preconverted AST in situations where that function has been declared inline. The AST will contain this AST as if a lambda expression had been used. For example, given (declaim (inline id)) (defun id (x) x)
, the AST for (+ (id y) y)
would end up more or less the same as the AST for (+ ((lambda (x) x) y) y)
, though with a few differences relating to debug information. This can facilitate optimizations but increases code size, as inlining tends to.
The middle end (not my term) of the compiler operates on BIR, or "block-based IR". BIR consists of nodes called "instructions" grouped into basic blocks called "iblocks". Instructions have zero or more inputs and zero or more outputs (usually only zero or one), all of which are other BIR structures called "data". Instructions, data, iblocks, and other BIR structures are all implemented as Lisp standard objects with standard classes that can be subclassed by interested clients.
The details of BIR are beyond the scope of this document. What's important is that this IR is designed to facilitate analysis of the code as well as transformations of it. Cleavir's analysis passes will gather information, which is then utilized by the transformation passes to optimize the code. In BIR, Cleavir can easily carry out type inference, use analysis (still in testing) and other analyses. The analysis and transformation systems are designed with an emphasis on modularity, so that clients can easily write their own analyses and/or transformations particular to their execution environment.
Generally speaking, the middle end operates on code that is still reasonably close to the source. Source function calls remain function calls, even if they should/will be converted into particular machine operations later. This is to that the analyses and transformations do not need to concern themselves with low level mechanisms, helping modularity.
BIR distinguishes global calls from "local" calls. Local calls are just function calls where the callee is known and part of the code being compiled; for example foo
in (flet ((foo (x) ...)) (values (foo y) (foo z)))
. Local calls can often be implemented more efficiently by the backend. Because the caller and callee are compiled together, they can agree on some arbitrary calling convention that never needs to be made reusable, they can share stack frames, etc. But they are useful to the middle end as well, because when the callee is known, a call is just another control construct. Cleavir can run analyses like type inference through local functions, in the sense that e.g. in (flet ((bar (x) x)) (+ (bar 7) 8))
it can determine that the first argument to +
is a constant fixnum. Cleavir tries to produce local calls as much as possible. Note that it is not always possible or straightforward to mark a call local even when the callee is known; for example (funcall (if x (lambda ...) (lambda ...)) ...)
could not be made into a local call without Cleavir moving the if outside of the funcall.
Inlining, described above, operates on the CST-to-AST level, and results in a call. Interpolation is the process of replacing a call with the actual body of the function, and so may correspond more closely to what one may consider "inlining". Cleavir will interpolate local calls to a function if all calls to that function are local and share the same continuation. In the future it will hopefully use a more sophisticated mechanism that can inline more generally if this does not cause too much expansion to code size, and/or if this is okay under a client-specified policy.
About continuations: In (print (cond (x (foo)) (t (print "h") (foo))))
, the calls to foo
share a continuation - both of their results are fed to print
, neither cond block establishes a special dynamic context, etc. Cleavir can also interpolate calls to tail recursive functions as long as all the recursive calls share the continuation. The resulting interpolation is called "contification" in the literature.
For global function calls, Cleavir needs to know something about them to perform any optimizations. This is only really possible for standard functions or special client functions that are guaranteed to not be redefined incompatibly. Cleavir calls these "known functions". Known function information should be returned by a client during CST-to-AST by the Environment subsystem. Primarily it just consists of a name, or "identity". Identities serve as keys to other analyses and transforms: analyses like type inference use identities to look up specific type derivation functions to determine specific types for a call, and transforms use identities to find transforms specific to a given known function (see "Callee Transforms" below).
Known information can also include other information useful to the optimizer. At present this information consists of binary flags. Flags include flushability, meaning that calls to a function can be deleted if their results are unused; dyn-call
, meaning that any functional arguments are only called in a dynamic environment substantially identical to the call itself; and note-call
, meaning that if calls to this function are not eliminated by optimization the compiler should issue the programmer a note about it.
Known information is attached in the BIR to function data, and not to calls. This allows Cleavir to have some information in cases like (funcall (if x #'+ #'-) ...)
or just (let ((f #'+)) ... (funcall f ...))
, but does entail some extra work flowing the information around.
The fundamental form in Lisp is the function call. A critical class of optimizations done by Cleavir concern simple transformations of calls: Transforming a call to a function to a call to a different, more specialized function.
For example, take (string x)
. Unoptimized, this will call the string
function, which has to branch on the type of its argument to do different coercions. Now say that Cleavir can determine that x
is necessarily a symbol. Then we can validly rewrite (string x)
as (symbol-name x)
, eliminating the type dispatch. This would of course not be valid in general, as string
accepts string or character arguments that symbol-name
does not deal with. Lower level optimizations described later could potentially replace an actual call to symbol-value
with a direct access into the symbol.
Another example does not use type information. Take (if (member x '(a b c)) ...)
, a reasonably common idiom. Unoptimized, this will call the member
function, which will return nil
or a sublist of (a b c)
. But it is clear that this result is only used as a boolean, so we don't really need to determine the correct sublist. Furthermore the list being searched through is constant. So this could be rewritten as (if ((lambda (y _) (declare (ignore _)) (case y ((a b c) t) (otherwise nil))) x '(a b c)) ...)
, which could then be interpolated to get (if (case x ((a b c) t) (otherwise nil)) ...)
. A further optimization could reduce the case
to a jump table or hash table lookup or something.
These transformations share several important qualities. They operate on syntactically non-obvious information (for the latter case, you can imagine the (member ...)
being bound to a variable which is then checked later). They are local: They rewrite a call as another call, and furthermore, are written so that only the callee of the call needs to be replaced, not the argument forms. They are IR-independent: they are easily expressible as source-to-source transformations regardless of how they are actually implemented, so the author of the transform doesn't need any knowledge of how Cleavir's IR works. They are backend-independent: they do not require exposure of any nonstandard and/or unsafe operators to be profitable.
Cleavir groups these sorts of transformations as "callee transforms", a term I just made up that may need work. A callee transform consists of a set of identities (see Known Functions above), a set of conditions, a function, and a specification of what analysis information is required by the function. The identities are of functions which the transform applies to calls of. The conditions are specifications of properties of the call that need to be true in order for the transform to apply. Generally, this will be analyzed information about the arguments and/or return values of the call, but can also include information about the compilation policy in place at the call. The function is a Lisp function that receives as arguments the specified analysis information, and either returns a lambda expression CST, returns a global function name CST, or declines to execute by calling decline-transform
. This last is provided so that transforms can carry out more sophisticated tests on the analysis information to see if they apply (for example, a transform to convert division to modular multiplication may only be possible with some divisors).
During optimization, Cleavir will locate calls to which the transform might apply, and call the transform function. If it returns a lambda expression, this will be compiled into BIR, substituted as the callee, and if possible, interpolated. If it returns a global function name, this will replace the callee. And if the transform, declines nothing will be done. Cleavir may try callee transforms any number of times on the same call, so as with macro functions, callee transform functions should not have or rely on side effects.
Because it cannot be used to write unsafe code any more than usual code could, this mechanism could be exposed by clients to their users.
As an optimization of the compiler it might be profitable to retain the BIR for transforms that always return the same forms, as is reasonably common in practice.
BIR allows its data to contain an arbitrary number of Lisp multiple values. A form like (foo (bar))
will be compiled to have the output of bar
feeding directly into the call to foo
without any "take primary value" or similar operator interposed. This greatly simplifies optimization (e.g., if some transformation for foo
wants to know if its argument comes from a call to bar
, that is obvious) but is a bit tricky with respect to Lisp semantics. AST-to-BIR is careful to never generate code that requires multiple values "in flight" at once without special instructions being placed for that purpose (e.g. multiple-value-prog1
will become a save-values
instruction).
One important aspect of this pervasive multiple values attitude is that calls can be treated as having one input (of multiple values) and one output (of multiple values), like some more functional programming languages. For example, a transform that applies to *
calls with two double-float arguments would check that the type of the arguments is a subtype of (values double-float double-float &rest nil)
, rather than that it has two arguments and both are subtypes of double-float
. These are more or less the same statement, but the former phrasing means the transform could apply just as well to a multiple value call. It is therefore strongly recommended that clients implement apply
as multiple-value-call
(i.e. (apply a b c d)
=> (multiple-value-call a (values b) (values c) (values-list d))
) to take full advantage of optimizations. Cleavir will also try to rewrite multiple value calls as normal calls when possible (i.e., when the value counts of all argument forms can be inferred and are amenable).
Cleavir's analysis phases will generally annotate the BIR data with inferred information, which can then be used to direct transformations. For example, type inference will annotate each datum with its inferred type.
TODO: Particular analyses
Done (well, sorta) but not described: Type inference
Done but not tested or described: Abstract interpreter (e.g. use analysis)
Not done at all: control-flow dependent analyses, non-cartesian analyses, about a million other things
TODO!
Done but not described: temporary variable elimination, come-from elimination, simple unwinds, if-if elimination, constant folding, type test folding (should this be made into a callee transforms?), eq test folding (probably should be a callee transform), fixed values, type test elimination, mv-call to call reduction
Not done: LICM (which maybe should be more backendy?), nonlocal call optimizations (e.g. not-not elimination if they don't end up as ifs), about a million other things
Cleavir does not do much specification of a backend in order to allow clients to be flexible about what they produce. It does provide a few lower level mechanisms to facilitate efficient translation into whatever the target may be.
In the middle end, Cleavir basically uses Lisp's uniform reference semantics: All values are treated as references; there is no "implicit copying", etc. In the backend, this assumption may be relaxed to facilitate efficient code. The most obvious aspect of this is unboxing: representing Lisp values as some particular non-reference things that would not be suitable for general function calls or other code expecting references. While Cleavir is necessarily vague about what these non-references might entail, it provides a representation selection subsystem to aid in selecting them.
In representation selection, every BIR datum is tagged with a representation rtype, or "rtype". Cleavir provides the multiple-values
and object
rtypes, representing some unknown number of references and a single reference, respectively. Selection of what other rtypes to use is up to the client. Cleavir provides a few more rtypes that can be used, such as short-float
, single-float
, etc. for unboxed floats, and boolean
for binary flags only input to conditional expressions.
Every instruction has allowed rtypes for its inputs and outputs specified by the client (with defaults to object
and multiple-values
provided), and Cleavir uses this information to try to assign rtypes to BIR data efficiently. For example, a datum defined by only a double-float arithmetic instructions and only used by double float arithmetic instructions is probably most efficiently represented as an unboxed double float. When the definition and use rtypes are incompatible - say, a double float arithmetic instruction output is to be used by a call to an unknown function - representation selection will insert a cast
instruction representing a runtime conversion, e.g. a boxing or unboxing.
A client will probably specify instruction rtypes primarily as primops - see "Primitive Operations" below.
Cleavir's algorithm needs a lot of improvement, but can at least handle unboxed arithmetic with some efficiency.
Operations with the semantics of function calls (that is: all of their arguments are evaluated normally, and they return 0 or more normal values) but which are not translated into object code as calls are called "primitive operations", or "primops". Primops can include many different kinds of operations depending on the client, but may include for example accesses to arrays of known specialization, machine arithmetic, and accesses into conses and structures.
For flexibility and modularity, Cleavir tries not to require or expose primops before the backend level, so that the middle end only needs to know about function calls. It also does not define any primops, or require any that clients need to implement, on the theory that the selection of primops is too client-dependent. But in the backend using function calls for all operations would be ruinous for performance, and Cleavir does provide hooks to the client so that it can define primops that Cleavir can use.
Each primop has a specification of the rtypes (see above) of its inputs and outputs, and a translation specification. Each primop definition can specify zero or more translations from higher level code. A translation is a known function (see above)'s name, and specifications for the types of inputs and outputs (n.b. would other analyses be useful here? I can't think of any). For example, a single-float-*
primop may specify that it can be translated from (* t single-float single-float)
, i.e. calls to *
taking two single floats as arguments and returning a value of unspecified type. When Cleavir finds such calls, its translation phase will replace them with a use of the primop and eliminate the global function lookup.
Some Lisp implementations find it useful to optimize certain uses of &rest
arguments so that they do not need to be consed as actual lists. For example, a function like (lambda (&rest r) (dolist (a r) (print a)))
could be compiled in such a way that r
is a stack-allocated vector (which we call an 'argvec") rather than an actual heap-allocated list, and so that the dolist
simply iterates over this vector. This is very dependent on the client's calling conventions, but Cleavir does have a phase available for determining whether this optimization could apply and applying it.
Cleavir can take a &rest
argument's BIR datum representation and trace how it is used. If all uses are "argvecable" (name needs workshopping), meaning they are amenable instructions or calls to amenable functions as defined by the client, and all definitions of the variable trace back to the &rest
list, Cleavir can transform the calls and instructions into other argvec-based primops and instructions, and mark the data as having an argvec
rtype.