Appendix A: ulam Programming Language Reference Manual

Introduction

This manual describes the ulam language adhering to the broad outline of the proposed C language standard submitted 31 October 1988 [K&R, 1988]. Whenever possible, it addresses the same points of interest, while identifying differences to help avoid surprises for the experienced C/C++ programmer.

While ulam is a powerful language for its intended purpose, it is not self-hosting. The ulam compiler is written in C++, and its output is heavily templated C++.

The initial release of ulam was announced at ECAL2015 in York, England. A summary of changes for subsequent releases shows our progress. This manual reflects the latest ulam, version 5.

Lexical Conventions

A program consists principally of one or more files with the .ulam suffix, although other suffixes (such as '.inc' for explicitly loaded files) do occasionally appear.

<LIBRARY> := <PROGRAM_OR_LOCAL_DEF>* + <EOF>

<PROGRAM_OR_LOCAL_DEF> := <PROGRAM_DEF> | <LOCAL_DEF>

<PROGRAM_DEF> := <QUARK_OR_UNION_DEF> | <ELEMENT_DEF> | <TRANSIENT_DEF>

<LOCAL_DEF> := 'local' + ( <TYPE_DEF> | <CONST_DEF> ) + ';'

It is translated in several phases. The low-level lexical transformation reduces the program to a sequence of tokens, and carries out ulam pre-parsing directives. Using the tokens, the parser implements the syntax and semantics defined by the grammar. The check-and-labeling phase identifies interdependent types, enforces type constraints, folds constant expressions, and determines the sizes of classes. The final compilation phase generates 'intermediate' C++ code, which is passed to g++ for translation to machine code.

Tokens

There are several token categories: delimiter, keyword, (internal) error, comment, identifier, type identifier, number, and operator. White space (blanks, tabs, newlines, formfeeds) are ignored except to separate tokens. There is a single token look ahead and unread, where the next token is the longest string of characters that could constitute a token.

Keywords

The following identifiers are reserved for use as keywords and may not be used otherwise:

Basic Type	Derived Type	Type Qualifier	Control	Preparse / Methods	Conditional	Value
Bits	atom	constant	break	load	as	atomof
Bool	element	parameter	continue	ulam	is	instanceof
Int	quark	typedef	else	use	case	lengthof
String	transient	local	for	@Concrete	otherwise	maxof
Unary	union		goto	`__MACROS__`*		minof
Unsigned			if	native		sizeof
Void			return	operator		classidof
Self	self		which	virtual		constantof
Super	super		while	@Override		positionof

Note: Italicized table entry: goto is for internal use only. 'Type' table headers, here, refer to ulam type; the MFM Atom type (bits 0-25) identifies its stored Element type for the current event.

Macro Keywords

Like C++ preprocessor macros, ulam-5 provides the following out-of-band keywords, useful for logs and debugging:

__FILE__ : a constant String of the current ulam filename ;
__FILEPATH__ : a constant String of the current ulam file path, including the filename ;
__LINE__ : an Unsigned terminal constant of the current line number in the ulam file ;
__FUNC__ : a constant String of the current ulam function name ;
__CLASS__ : a constant String of the name of the current ulam class: Q ;
__CLASS_SIGNATURE__ : a constant String, signature of the current ulam class: Q(Int(32) b,Unsigned(4) s) ;
__CLASS_PRETTY__ : a constant String, full signature (with arg values) of the current ulam class: Q(Int(32) b=4,Unsigned(4) s=2u) ;
__CLASS_SIMPLE__ : a constant String, simple name of the current ulam class: Q(4,2u) ;
__CLASS_MANGLED__ : a constant String, mangled name of the current ulam class: Uq_10121Q12102321i1410141u12.

The first three are expanded into ulam constants before parsing by the Lexer; __FUNC__, without full signature, known later, is converted by the Parser; the other __CLASS*__ names are completed as they are known (before code generation), and differ for template instances as seen in the examples provided.

Comments

Three types of comments are supported:

Traditional C style comments that begin with /* and end with */ and may span multiple lines;
C++ style comments begin with // and end a line; and,
Structured comments are traditional comments that begin with /**.

Comments do not nest. Only structured comments are tokenized, the others are dropped. A structured comment that immediately precedes a class (element, quark or union, transient), or any of its class members, is passed to the MFM Simulator for subsequent processing.

Identifiers

An identifier is a sequence of letters and digits where the first character must be a letter. Underscores are treated as a letter after the first character. Upper and lower case letters are different. This difference is extended in ulam, such that, identifiers that begin with an upper case letter indicate a type, and lower case letters are names of variables, named constants, model parameters, methods, and class instances. Special function identifiers may end with operator characters (ulam-3). Constant Strings are supported since ulam-3.

Constant Numbers

ulam supports a subset of the C constants: integer, unsigned, and boolean. Their size is the smallest number of bits needed to represent them.

A numeric constant may be suffixed by the letter u or U to indicate that it is unsigned; Hexadecimal, octal and binary numbers are unsigned by default (ulam-3). Hexadecimal numbers are preceded by 0x or 0X (digit zero); values 10 through 15 are represented by a or A through f or F. Octal numbers are preceded by the digit zero and values 0 through 7. An octal value may also be represented by a single-quoted character or by an escaped sequence (e.g., bell, backslash, newline, ..) as defined in 'C'. Binary numbers are preceded by 0b or 0B (digit zero) and values 0 and 1 (ulam-3). Floats are not supported.

<NUMBER> := <HEX_UNSIGNED> | <OCT_UNSIGNED> | <BIN_UNSIGNED> | <DEC_NUMBER> + <UNSIGNED_SUFFIX>

<HEX_UNSIGNED> := /0[xX][0-9a-fA-F]+/

<OCT_UNSIGNED> := /0[0-7]*/

<BIN_UNSIGNED> := /0[bB][0-1]+/

<DEC_NUMBER> := /[1-9][0-9]*/

<UNSIGNED_SUFFIX> := 0 | 'u' | 'U'

A Unary value n, represented internally as n-ones, appears as n when used.

Boolean values, true and false initialize all bits to one and zero, respectively. The majority population bit count is the value of (odd width) Bool types.

<BOOLEAN> := 'true' | 'false'

Strings

In ulam, a limited form of compile-time constant character strings is supported (ulam-3). The primitive type String is an 18-bit consecutive user-visible index, as of ulam-6, two bits less than ulam-5, that represents an index into a string pool; in its first implementation (ulam-3) as a two part index into a class-specific user string pool. In its second (and current) implementation (ulam-4), there is one global user string pool for all the classes compiled into a single shared object (.so); in future work, a few SLOT-specific user string pools will allow for multiple shared objects without the runtime overhead of class-specific string pools. A table of user strings is an array of 8-bit unsigned chars (typedef ASCII in UrSelf). Each entry (starting at position 1) has the length of the user string followed by the NULL terminated string of ascii characters; index zero is reserved to mean undefined. Except for a named constant string, variables of type string may be uninitialized until accessed; once defined, a string variable can be reset to index zero by assignment of another yet to be defined string variable. A user string may not exceed the length representable by a single byte (255). The string index is used to determine its position in the user string pool; and previously (in ulam-3), the class that defines the string. To insure an index accesses the start of a string entry, an indirect consecutive user-visible index is assigned (ulam-5). New to ulam-5, a user may validate a string index by casting it to a Bool. Furthermore, a specific character in a user string may be accessed with square brackets. The length of the string (not including the NULL terminator) is available with the .lengthof operator below. Arrays of variable length Strings is supported.

<STRING_LITERAL> := <DOUBLE_QUOTE> + (<ESCAPED_BYTE> | <NOT_DOUBLE_QUOTE>)* + <DOUBLE_QUOTE>

<BYTE_LITERAL> := <SINGLE_QUOTE> + (<ESCAPED_BYTE> | <NOT_SINGLE_QUOTE>) + <SINGLE_QUOTE>

<ESCAPED_BYTE> := /\\([abtnvfr"'\\]|[0-7]{1,3}|[xX][0-9a-fA-F]{1,2})/

Meaning of Identifiers

Identifiers, or names, refer to a variety of things: functions or methods, classes (elements), structures or unions (quarks), temporary structures (transients); members of elements, quarks, transients; parameters of functions, and template elements, quarks and transients; typedef alias names, named constants, model parameters, function call arguments, and objects. An object is also called a variable.

A name also has scope, the region of the program in which it is known, and whether or not the same name in another region refers to the same object or function.

Storage

For indefinite scalability, the only persistent data memory is a spatial grid of Atom storage; there is no dedicated heap nor random-access main memory. There is an implicit stack for function calls during a single event, and an EventWindow library provides read/write access to the local grid neighborhood, comprising 41 total Atoms within Manhattan distance 4 of the center. The underlying MFM tile structure is not directly accessible from ulam.

Packed storage designations for ulam objects:

Storage	'Fits Into'
PACKED	Atom state bits
PACKLOADABLE	32- or 64-bit 'C' variable
UNPACKED	Atom or Element (including Type), Big arrays, Bits, Transients

Immediate types are basic and derived types, scalars and arrays, that are packed into a bit structure or field, the size of an atom. Primitive basic types are right-justified; derived Classes, and their ancestor classes are left-justified. Transient classes are temporary bit structures that can exceed the size of an atom.

Basic Types

There are several fundamental, primitive types that can be declared in widths (k) ranging from 1 to 64 bits, with these exceptions: Bool uses only odd widths, Void is always zero, String is only its default size, and Bits may be up to 8K bits (ulam-5). Single letter type abbreviations appear in the generated code mangled names (Appendix C).

Ulam type	Interpretation	Numeric/Ordered	Default (k)	Abrv.
`Unary(k)`	Base 1 (population count)	Y	32	'y'
`Unsigned(k)`	Base 2 positional	Y	32	'u'
`Int(k)`	Two's complement base 2	Y	32	'i'
`Bool(k)`	Boolean (majority of pop. count)	N	1	'b'
`Bits(k)`	Uninterpreted bit values	N	32	't'
`Void`	Empty set of values	N	0	'v'
`String`	User String index	N	18	's'

Derived Types

Derived types are constructed from the basic types as follows:

array of objects of a given type;
function, returning an object of a given type; a class member;
element, a class containing a sequence of objects of various types, up to 71 bits;
quark, a struct containing a sequence of objects of various types, total size from 0 to 64 bits;
union, a quark capable of containing any one of several objects of various types from 0 to 64 bits;
transient, a temporary struct containing a sequence of objects of various types, up to 8K bits;
atom, a class instance; size is 96 bits with 25 bits reserved for the MFM type and error correction; quarks are the UNDEFINED MFM type. Default atoms are the Empty type.
self, type Self&; refers to the effective-self within virtual functions, and the override class' position within the effective-self;
Self, a quark, element, or transient's type (ulam-2);
super, type Super&; refers to the effective-self within virtual functions, and the superclass' position within the effective-self; is the first direct baseclass in breadth-first order; UrSelf by default;
Super, a quark, element, or transient's super ancestor type (ulam-2).

Reserved Method Names

The following methods for elements (E) and/or quarks and unions (Q) and/or transients (T) indicate special functionality in ulam :

Special method	On	Purpose
`Void behave()`	E	Perform event
`ARGB getColor(Unsigned selector)`	EQ	Color in simulator
`Int test()`	E	Run unit tests
`Int toInt()`	Q	Custom cast to Int
`T aref(Int)`	EQT	Custom array read
`T& aref(Int)`	EQT	Custom array read/write (ulam-3)
`Unsigned alengthof()`	EQT	Custom array lengthof (ulam-3)
`Void aset(Int, T)`	EQ	Custom array write (deprecated ulam-3)
`Self(T,)`	EQT	Class Constructor (ulam-3)
`T operatorX(T')`	EQT	Class Operator Overload (ulam-3)

The primal base class, UrSelf, defines behave and getColor as virtual methods (ulam-2); a base class quark's getColor is called automatically by the simulator. Method keywords are used in function declarations. T' stands for none, one, or more argument types for a Class overload function defined for operator X (ulam-3).

Type Qualifiers

In addition to packed bit fields, ulam has qualifiers that help to reduce the space requirements of an object:

A named constant, a declared object with a primitive, class (ulam-4), or atom (ulam-5), type and value preceded by the constant keyword, announces its value will not be changed and occupies no space in the element, quark, or transient, it belongs.

A model parameter, indicated by the keyword parameter preceding a declared primitive scalar data member is similar to the 'static' keyword in C; and has a zero cost in space to the element it belongs. Its default value is required at declaration, and is modifiable by the user through the MFM Simulator interface. Its value is shared among all instances of the same ulam class type. Maximum bit size is 32.

Class parameters, like template parameters in C++, are treated as named constants; their values are instance specific and may influence the overall size of their object; default values are optional.

A typedef, indicated by the keyword typedef preceding a type and a type identifier, is an alias for a type. A typedef specifier does not reserve storage.

A file scope local def, indicated by the keyword local preceding a typedef or a named constant, is defined outside of a class, for the scope of an ulam file (ulam-3). A local def does not reserve storage, may be shadowed in an inner scope, and specified explicitly by a 'local.' prefix. A separate context is generated to represent the constants and typedefs that belong to each file scope. There is no inheritance, from UrSelf or any base class. Typedefs explicitly specified from another class can be used in a locals scope context.

Objects and Lvalues

An object is a named region of storage; an lvalue is an expression referring to an object that is store-into-able. In ulam lvalues are limited to object names and array references. The name lvalue originates from the assignment expression y = x where the left operand y must be an lvalue expression. Each operator specifies whether it expects lvalue operands and whether it yields an lvalue. The atomof operator references the storage or reference of an element or an inherited quark explicitly. A 'question-colon' expression, also known as the conditional (or ternary) operator, can be used in places where an if-then-else statement cannot (ulam-3).

<LVAL_EXPRESSION> := <IDENT> | <IDENT_EXPRESSION> + '[' + <EXPRESSION> + ']' | <IDENT_EXPRESSION> + '.atomof' | <QUESTION_COLON_EXPRESSION>

Conversions

Many operators cause conversions. The effect is to bring operands into a common type, such as the type of the result. In C, this pattern is called the usual arithmetic conversions. In ulam, implicit casts must be "safe" with no-loss of data, or possible saturation; otherwise explicit casts are necessary.

Safe Casts:

To the same type that's at least as big;
Any non-class type to Bits that's at least as big;
Quark to Bits (and back), explicitly with exact bitsizes;
Transient to Bits (and back), explicitly with exact bitsizes;
Element to Bits (not back), explicitly with exact bitsizes;
Bool to Bool, any sizes;
1-bit Unary or Unsigned to Bool;
Unsigned to Int when the Int is bigger;
Unsigned to Unary with bitsize at least its maximum unsigned value;
Unary to Unsigned with bitsize at least 1 + log(base 2) of Unary bitsize;
Unary to Int with bitsize bigger than 1 + log(base 2) of Unary bitsize;
Quark to Int, if toInt method is provided;
Any type to Void (size 0);
Element to Atom;
Subclass to its superclass;
Subclass to any of its baseclasses (ulam-5);
A reference (&) and its referenced type.

The rules for arithmetic, logical, comparison, shift, bitwise and unary operations in ulam are as follows:

Arithmetic

Arithmetic operations are performed as 32-bit or 64-bit signed or unsigned numbers. Unary types are treated as unsigned. Arithmetic on Bools and Bits is not supported without explicit casting. The result is either signed or unsigned. If one operand is unsigned, and the other isn't: signed wins and the resulting size is adjusted by operation. For unsigned constants, append lowercase 'u' to a decimal number to avoid the need to cast. A quark along with anything else always (pre-ulam-3) goes to Int(32) (the toInt method per quark, if present, is responsible for its conversion). As of ulam-3, any Class (including a quark) that precedes an arithmetic operation will automatically invoke a call to its operator overload function; for backward compatibility, a quark (or quark ref) as the righthand operand will continue to use its toInt method, if one exists.

The resulting bitsize is operation specific as delineated in the following table, where 'l' is the bitsize of the left operand after any conversion, and 'r' represents the right operand bitsize after any conversion (e.g. Unary to Unsigned/Int), without exceeding its word size. Numbers take on the bitsize required to fit their value; for example, 3 is type Int(3), 3u is type Unsigned(2).

Op	Result Bitsize
+	max(l,r) + 1
-	max(l,r) + 1
*	l + r
/	l
%	r

As in 'C', 32-bit or 64-bit operations may overflow and produce incorrect results.

Logical

All logical operations are performed on two Bool operands as Bool in the size of the larger operand type. The result is Bool in the size performed.

Comparison

Same as arithmetic rules with the following exception: Bits and Bools may be compared for equality in their respective type. The result is Bool.

Shift

Shift operations are logical (non-arithmetic) and are performed as Bits in the size delineated in the following table, without exceeding its word size, or 8K bits. The right operand, the shift distance, must be Unsigned; negative shifts are not supported. The result is Bits in the size performed.

Op	Result Bitsize
<<	l + 2^r
>>	l

Since Bits do not saturate their bitsize, shift operations allow bits to drop. For example, the result of 3 * 4 , explicitly cast as an Unsigned 3-bit type, is 7 (its maximum value); however, the left shift operation 3 << 2u cast as the same type is 4, as the high-order bit drops.

Unlike 'C', shift distances greater than or equal to the data width (32- or 64- bits) return 0. Ints do not sign extend when cast to Bits.

Bitwise

Bitwise operations are performed as Bits in the size of the larger operand. The result is Bits in the size performed.

Unary

Operand types to unary operators are: numeric for +; signed numeric for -; and, boolean for !. The results are in the type and size performed.

Expressions

The grammar in Appendix B incorporates the precedence and associativity of the operators.

Primary Expressions

Primary expressions are identifiers and constants of type specified by its declaration, or expressions in parentheses. Constant Strings are supported in ulam version 3.

Postfix Expressions

The operators in postfix expressions group left to right. They currently include: array references, function calls, selected members of an element, quark, or transient, and increment/decrement one after use.

Array References

An lvalue expression in square brackets is a postfix expression denoting a subscripted array reference.

<IDENT_EXPRESSION> + '[' + <EXPRESSION> + ']'

The subscript expression must be a numeric type. Out-of-bound array references undetectable by the compiler will cause a runtime failure. The lefthand side of the square bracket may be an ident expression, including function calls returning a custom array object, a reference to an array, or a string (ulam-3).

Function Calls

A function call is a postfix expression: a function designator/name followed by parentheses containing a possibly empty or comma-separated list of assignment expressions which constitute the ordered arguments to the function. The lefthandside of a function call must be modifiable.

<FUNC_CALL> := <FUNC_IDENT> + '(' + <ARGS_OR_NONE> + ')'

<ARGS_OR_NONE> := 0 | <ARGS>

<ARGS> := <ARG> | <ARG> + ',' + <ARGS>

<ARG> := <ASSIGN_EXPRESSION>

The term argument is used for an expression passed by a function call; the term parameter is used for an input object received by a function definition, or as described in a function declaration.

Argument passing in ulam is by value, unless a reference (&) is designated. A function may change the values of its parameter objects, but these changes do not affect the values of non-reference arguments. The first two "hidden" arguments are passed as reference: the ulam context, and the self object. The special keyword self is of type Self&; may not be shadowed, and has scope within a function definition. The as conditional expression is the exception: self is a valid lefthand side identifier that modifies the position of the self object, not its effective-self, and has scope within the as-block. Use with caution.

The number of arguments must be the same in type and number of the explicitly described parameters, unless the declaration's parameter list ends with the ellipsis notation (, ...). In that case, the number of arguments must equal or exceed the number of parameters, and the function definition is native.

The order of evaluation of arguments is unspecified, however, the arguments and the function designator are completely evaluated, including all side effects, before the function is entered. Recursive calls to any function are permitted.

In ulam, argument and return values must be PACKED or PACKLOADABLE objects, an UNPACKED atom or class. UNPACKED arrays and transients may be used as local function variables or reference arguments. Furthermore, arguments may be implicitly safely cast to match parameter types. Perfect matches have precedence; Ambiguous matches cause a compilation error.

Class Constructors

Unlike regular functions, Class Constructors are called by local function variable declarations during initialization where the variable name is substituted for the function designator (i.e. Self), followed by a non-empty list of arguments within parentheses. Class constructors may also be called inline following an instanceof operator.

Member Select Expression

A postfix expression followed by a . (dot) followed by an identifier is a postfix expression. The first operand expression must reference an element, quark, union or transient, and the identifier must name a member of the element, quark, union, transient, or one of its ancestors. The value is the named member of the element, quark, union or transient, and its type is the type of the member.

<MEMBER_SELECT_EXPRESSION> := <IDENT_EXPRESSION> + '.' + ( <IDENT_EXPRESSION> | <OF_OPERATOR> | <INSTANCEOF_CONSTRUCTOR_CALL> | <MEMBERSELECT_BY_BASECLASS> )

As of ulam-5, multiple inheritance is supported (see Class Definition). To reference a specific baseclass member, its ancestor type follows the first operand, delineated by dots. No more than one explicit ancestor is expected, since all baseclasses share common ancestors. The keyword super is shorthand only for the superclass; otherwise self or an identifier must precede the ancestor type. The expression within square brackets is a special syntax to request, at runtime, a specific baseclass virtual function table by its unique classId; it follows a known common baseclass type (ulam-5).

<MEMBERSELECT_BY_BASECLASS> := ( <ELEMENT_ANCESTOR> | <QUARK_ANCESTOR> | <TRANSIENT_ANCESTOR> ) + ( 0 | '[' + <EXPRESSION> + ']' ) + '.' + <IDENT_EXPRESSION>

To explicitly reference a local file scope constant, the first operand is the keyword local followed by a dot (ulam-3).

The Of postfix operators (minof, maxof, sizeof, lengthof, instanceof, atomof, classidof, constantof, positionof), are described separately below.

Postfix Incrementation

Postfix incrementation/decrementation (ulam-3) is a unary expression followed by a ++ or -- operator. The operand is incremented or decremented by 1. The value of the assigned expression is the value before the incrementation (decrementation).

The operand must be an lvalue; the result is not an lvalue.

<LVAL_EXPRESSION> + <LVAL_UNOP>

<LVAL_UNOP> := '++' | '--'

However, when the operand is a Class, the result is the return type of its operator overload function with one integer argument of value 1 (ulam-3).

Unary Operators

Expressions with unary operators, including casts, the class relational operator, and of-operators, group right-to-left.

<UNOP> := '-' | '+' | '!'

Prefix Incrementation

Prefix incrementation/decrementation is a unary expression preceded by a ++ or -- operator. The operand is incremented or decremented by 1. The value of the assigned expression is the value after the incrementation (decrementation).

The operand must be an lvalue; the result is not an lvalue.

<LVAL_UNOP> + <LVAL_EXPRESSION>

<LVAL_UNOP> := '++' | '--'

However, when the operand is a Class, the result is the return type of its operator overload function with no arguments (ulam-3).

Unary Plus Operator

The operand of the unary + operator must have a numeric type, and the result is the type of the operand. However, when the operand is a Class, the result is the return type of its operator overload function with no arguments (ulam-3).

Unary Minus Operator

The operand of the unary - operator must have a signed numeric type, and the result is the negative of its operand. The type of the result is Int in the bitsize of its operand. However, when the operand is a Class, the result is the return type of its operator overload function with no arguments (ulam-3).

One's Complement Operator

The operand of the unary bitwise ~ operator is a Bits type, or a type that is safely cast to Bits; Classes must first be explicitly cast to Bits in their same bitsize. The result is the one's complement (all bits flipped). The type of the result is Bits in the bitsize of its operand. However, when the operand is a Class, and an operator overload function exists, the result is the return type of its operator overload function with no arguments (ulam-5).

Logical Negation Operator

The operand of the ! operator is a boolean type, and the result is true if the value of its operand compares equal to false, and false otherwise. The type of the result is Bool in the bitsize of its operand. However, when the operand is a Class, the result is the return type of its operator overload function with no arguments (ulam-3).

Casts

A unary expression preceded by the parenthesized name of a type causes an explicit conversion of the value of the expression to the named type. This construction is called a cast. An expression with a cast is not an lvalue. Any type may be cast to a Void; a Void type can only be cast to a Void. Reference casts may only be used in reference initializations, including function call arguments, and return statements (ulam-6).

In ulam, all primitive numeric type casts saturate in the destination type. Bits do not saturate their bitsize. A quark (or quark ref) automatically casts to an Int(32) when the toInt method is present. Classes may be explicitly cast to Bits when the bitsizes are identical; quark or transient classes may be explicitly cast from Bits (ulam-5).

A String may be cast to a Bool to ascertain its validity (ulam-5).

Class Relational Operator

ulam has a class relational operator: is. Type is Bool.

<IDENT_EXPRESSION> + 'is' + <TYPE_IDENT>

The is operator tests whether an atom or class object referred to by the lefthand side identifier is a specific class type or inherited class type (rhs). The value is true when it is. The is operator works in conjunction with the conditional expression as, as described below.

*-Of Operators

Six of the eight of operators, applied to objects or types, yield constant values of varying types:

`<OF_OPERATOR> := <OF_CONSTANT> | 'instanceof' | 'atomof' | 'positionof'

The atomof operator, yields a reference to the atom storage of certain class objects (i.e., element or ancestor quark) or reference. Unlike the other Of-operators, it is not a constant, and is store-into-able. The operand is an identifier or reference. The result is a reference to an Atom. This operator behaves like a virtual function in the case of reference operands, including self and super.

The instanceof operator yields the default value of a class, based on its data members' initial values. The operand is either an identifier, reference, or a type name. The result of a reference operand is a new Atom whose type is known at runtime, otherwise the result is the type of the operand. This operator behaves like a virtual function in the case of reference operands, including 'self' and 'super'. To initialize a constant class with default values use the expression = {}; instead.

<INSTANCEOF_CONSTRUCTOR_CALL> := 'instanceof' + <CONSTRUCTOR_CALL>

The constantof operator (ulam-5) yields the default value of a class, based on its data members' initial values. Similar to 'instanceof', except the result is a Class constant.

The classidof operator (ulam-5) yields the unique Registry Number, or classId, of a class object, reference or type. The result is an Unsigned constant.

The positionof operator (ulam-6) yields the absolute position in storage of a class object, reference or type. The result is an Unsigned constant, unless a reference is used.

The lengthof operator (ulam-3) yields the number of items in an array; for Strings, the number of ASCII characters. For custom arrays, when the alengthof method exists, it is automatically called. The result is an Unsigned constant.

The sizeof operator yields the number of bits required to store an object of the type of its operand. The operand is either an identifier, reference, or a type name. When sizeof is applied to an array object, the result is the total number of bits in the array, that is, the number of array items times the size of one item. When applied to a quark or transient, the result is the number of bits in the object; size of a union is its maximum sized data member. Atoms and atom references are a constant size. An element is atom-based and returns the size of an atom (ulam-2). Like C++, this operator behaves like a non-virtual function in the case of reference operands, where the size returned is that of the referenced type. The result is an Unsigned constant.

The maxof operator yields the greatest possible value of the type of its operand. The operand is either an identifier, or a type name. The operator may not be applied to an operand of a class, an array, or an unordered type. The result is the type of the operand.

The minof operator yields the least possible value of the type of its operand. The operand is either an identifier, or a type name. The operator may not be applied to an operand of a class, an array, or unordered type. The result is the type of the operand.

The of-operators appear in two grammatical expressions, one beginning with a type and the other with a named object:

<OF_EXPRESSION> := <TYPE> + '.' + ( <OF_OPERATOR> | <INSTANCEOF_CONSTRUCTOR_CALL> )

<MEMBER_SELECT_EXPRESSION> := <IDENT_EXPRESSION> + '.' + ( <IDENT_EXPRESSION> | <OF_OPERATOR> | <INSTANCEOF_CONSTRUCTOR_CALL> )

Binary Operators

In ulam, for most binary operators, when the left operand is a Class (including quarks with a toInt method), the result is the return type of its operator overload function with argument matching the right operand type (ulam-3).

Multiplicative Operators

The multiplicative operators group from left-to-right and must have a numeric type. The usual arithmetic conversions are performed on the operands, and predict the type of the result.

<MULOP> := '*' | '/' | '%'

Additive Operators

The additive operators group from left-to-right and must have a numeric type.

<ADDOP> := '+' | '-'

Shift Operators

The shift operators group from left-to-right, lefthand side must be castable to Bits, righthand side to Unsigned. Only logical (non-arithmetic) shifts are supported; the result is Bits. An explicit cast is required to cast the result back to the type of the left side or any non-Bits type.

<SHIFTOP> := '<<' | '>>'

In ulam, the bitsize of the type influences the result of a shift operation due to dropped bit positions; arithmetic and multiplicative operations saturate their results rather than dropping bits.

Quarks cast to Int provided the toInt method exists, and then to Bits. Ints do not sign extend when cast to Bits. Atom shifts are not supported.

Relational Operators

The relational operators group left-to-right. The operands are ordered primitive types.

<COMPOP> := '<' | '>' | '<=' | '>='

The result is false if the specified relation is false and true if it is true. The type of the result is Bool.

However, when the operand is a Class, if the operator used has no defined overload method, ulam automatically complements the result of the inverse operation provided; return types compatible with Bool recommended (ulam-3).

Equality Operators

The equality operators follow the same rules as the relational operators except for their lower precedence, and an ordered type is not required. (Thus, except for Bits and Bools, a<b == c<d is 1 whenever a<b and c<d have the same truth-value.)

<EQOP> := '==' | '!='

However, when the operand is a Class, if the operator used has no defined overload method, ulam automatically complements the result of the inverse operation provided; return types compatible with Bool recommended (ulam-3).

Bitwise Operators

Bitwise operators apply only to Bits type operands. The result is Bits of the same bitsize.

<BITOP> := '&' | '|' | '^'

Logical Operators

Logical operators group left-to-right. They apply only to Bool type operands. The result is Bool.

<LOGICALOP> := '&&' | '||'

Logical and operator guarantees left-to-right evaluation: the first operand is evaluated including all of its side-effects; if it is true, the right operand is evaluated. The expression's value is true iff both operands are true. The result is Bool.

Logical or operator guarantees left-to-right evaluation: the first operand is evaluated, including all side-effects, and if the value is false, the expression takes the truth value of the second operand; otherwise the expression takes the truth value of the first operand without evaluating the second.

Unlike C++, Class overload functions for the logical operators is not supported (ulam-3).

Conditional Operator

The conditional (or ternary) operator (?:) is supported in ulam-3. Question-colon expressions are evaluated only if they are chosen. The condition expression, of type Bool, is evaluated first. When it compares to true the second expression is executed; otherwise, the third expression is executed. Unlike the control-if statement, these last two expressions must be the same type or its ref, safely castable for primitive types, or explicitly cast. Furthermore, the question-colon expression may be an lvalue; it may also be used inside a function call argument.

<QUESTION_COLON_EXPRESSION> := <EXPRESSION> + '?' + <EXPRESSION> + ':' + <EXPRESSION>

Class overload functions for the conditional operator is not supported (ulam-3).

Comma Operator

Except in contexts where comma is given a special meaning, for example, in lists of function arguments and lists of initializers, the comma operator is not supported.

Conditional Expression

The conditional expression is used in Control Statements to determine the flow of execution. The type is Bool.

<CONDITIONAL_EXPRESSION> := <SIMPLE_COND_DECL> | <ASSIGN_EXPRESSION>

In ulam, the as operator allows an arbitrary atom or element object referred to by the lefthand side identifier to access a specific element or quark type designated on the righthand side, when the object either is an element, or inherits from a quark of this type. When the lefthand side object is a transient, the righthand side can be either an inherited transient or quark type. When the lefthand side object is a quark, the righthand side can only be an inherited quark type, unless the object is a reference in which case its effective-self is checked at runtime. The self keyword is a valid lefthand side identifier (keyword super keyword is also adjusted for the scope of the as-block (ulam-5)); Self is a valid righthand type.

<SIMPLE_COND_DECL> := <IDENT> + 'as' + <TYPE_IDENT>

Assignment Expressions

All assignment expressions group left-to-right. All require an lvalue as left operand, and the lvalue must be modifiable: it must not be an array with different types or sizes, and must not have an incomplete type, or be a cast. It may be a function that returns a reference.

<ASSIGN_EXPRESSION> := <EXPRESSION> | <LVAL_EXPRESSION> + <ASSIGN_OP> + <ASSIGN_EXPRESSION> | <LVAL_EXPRESSION> + <LVAL_UNOP>

<ASSIGN_OP> := '=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>='

The type of an assignment expression is that of its left operand, and the value is the value stored in the left operand after the assignment has taken place. In ulam, all primitive type assignments saturate in the destination type; Bitwise operator equal operators require the lefthand side be a Bits type, and the right safely castable to Bits; Shift operator equal operators require the lefthand side be a Bits type, and the right safely castable to Unsigned.

When the left operand is a Class, an operator= overload function with matching argument as the righthand type is called and the result is the return type of that function including any side-effects; otherwise, C++ default structure assignment is used (ulam-3). As with any function call, arguments may be implicitly safely cast to match parameter types; perfect matches have precedence; ambiguous matches cause a compilation error.

An expression of the form E1 op = E2 is equivalent to E1 = E1 op (E2) except that E1 is evaluated only once.

Constant Expressions

Expressions that evaluate to a constant are required in several contexts: array bounds, bit-field lengths, the value of a named constant, the initial value of a model parameter, element/quark/transient class arguments, and class parameter and data member default values. The primitive scalar types are supported. Constant classes are also supported (ulam-4), however constant functions are not. Constant atoms, initialized to constant classes, are supported (ulam-5); and, an array of constant atoms may be initialized to different constant class types (see 'constantof' operator).

Declarations

The distinction between declaration and definition is different in ulam than C, C++, or Java.

In ulam, identifiers are declared as: data members and template parameters of element, quark, unions, or transients as part of their definition; as immediate local variables and parameters in a function definition; or, as local defs constants shared by the classes in an ulam file. Declarations specify the type given to an identifier. Initialization as part of a declaration may occur with class parameters, data members and local function variables (not function parameters), and must occur with model parameters and named constants. These required and optional initializations, excluding local function variables, must be constant expressions. References (&) may be used as local function variables, and must be initialized. References may be used as function parameters, and as function return values that do not reference local function variables (ulam-3).

There are no external declarations of element, quarks, unions, transients, or functions in ulam, and their definitions immediately follow their declarations. Constructors may be used to initialize identifiers of a class type only when they are used as local function variables. Class type identifiers used as data members, local function variables, or as named constants, may be initialized with a C-99 struct initialization style syntax; Unions may be initialized in the (write-once) C-99 style only when they are used as local function variables, or as named constants (ulam-4):

<CLASS_INIT> := '{' + <DM_INIT_LIST> + '}'

<DM_INIT_LIST> := 0 | <DM_INIT> | <DM_INIT> + ',' + <DM_INIT_LIST>

<DM_INIT> := '.' + <IDENT> + '=' + ( <CONSTANT_EXPRESSION>

     ` | '{' + ( 0 | <CONSTANT_EXPR_LIST> ) + '}' | <CLASS_INIT>`

      `| '{' + ( 0 | <CLASS_ARRAY_INIT_LIST> ) + '}' )`

In the case of arrays, the last initialized value propagates when the array size exceeds the number of initialized entries provided:

<CLASS_ARRAY_INIT_LIST := <CLASS_INIT> | <CLASS_INIT> + ',' + <CLASS_ARRAY_INIT_LIST>

An empty list {}, or successive commas in a list, initializes a primitive array to all zeros; a class to its default value.

Type Specifiers

In ulam, type specifiers begin with a capital letter (see Basic Types above). The bit-length within parentheses may declared immediately after the type in widths ranging from 1 to 64 bits, with the exception that Bool uses only odd widths, Strings are always the default bitsize, and Bits can be 8K, the size of the biggest transient. Arrays may be declared using square brackets as in 'C'.

Types may also be qualified, to indicate special properties of the objects being declared, such as named constant, model parameter, and local file scope. The typedef specifier is an alias for a type.

Enumerations

Are not currently supported except as named constants.

Declarators

A declaration has the form "T D" where T is a type and D is a declarator.

<DECL> := <TYPE> + <VAR_DECLS>

<VAR_DECL> := <LVAL_EXPRESSION>

Declarations with the same base type may be declared as a list:

<VAR_DECLS> := <VAR_DECL> | <VAR_DECL> + ',' + <VAR_DECLS>

Pointer Declarators

Pointers are invisible to the ulam programmer.

Array Declarators

When the constant expression is present within square brackets, it must have numeric type, and value 0 or greater, unless it is initialized. Multi-dimensional arrays are based on a single-dimensional array. Two method names, aref and aset, offer the ulam programmer the option of creating custom arrays. When these methods exist, they are automatically used to access the array items on these otherwise apparent scalar types. As of ulam-3, aset has been replaced by aref returning a reference; and overload operator[] supercedes aref. Another possible implementation of a multi-dimensional array is an array of a class (quark or transient) with an array data member.

Arrays can be initialized to a list of constant values. An array of Strings may be initialized to a list of double quoted constant values of varying length (ulam-3).

Function Declaration

Several reserved keywords apply to ulam functions:

The keyword native at the end of a function declaration:

<NATIVE_FUNC_DEF> := <FUNC_DECL> + 'native' + ';'

indicates to the ulam compiler that no ulam definition exists for this function, and that the C++ code is provided by the programmer (i.e., not generated). The provided code must follow internal ulam conventions, including name mangling, and immediate types, as generated by the ulam compiler.

The keyword virtual at the beginning of a function declaration:

<VIRTUAL_FUNC_DEF> := 'virtual' + ( <ULAM_FUNC_DEF> | <NATIVE_FUNC_DEF> | ';')

indicates, as in C++, that function may be overridden by a subclass (ulam-2). Pure virtual functions have no definition and terminate with a semicolon; A class with one or more pure virtual functions is abstract and cannot be instantiated as a data member, local function variable, or function parameter. A pure virtual function call will produce a runtime Failure. See the @Concrete class keyword to check classes for abstract errors before runtime.

The keyword @Override preceding a virtual function definition will insure its implementation overrides an ancestors', similar to Java (ulam-3).

Virtual functions may also be native. The default is non-virtual.

Since every class inherits from UrSelf, there are at least two entries per virtual function table, VTable. Prior to ulam-5, a VTable was arranged by virtual function indexes (named VTABLE_IDX_ prefix followed by its mangled name with parameters types); with multiple inheritance, its layout includes virtual functions' originating class as well as its function index (renamed with prefix VOWNED_IDX_), both known at compile-time. In ulam-5, virtual functions called on complete objects (i.e., local variables or data members that are not references), bypass the runtime look-up that happens in most cases. To direct the look-up of a virtual function to an ancestor's VTable, rather than the default effective-self, the member select by baseclass expression is specified by the programmer. Except for virtual functions called on data members, the effective-self passed to the called virtual function does not change; the pos in self, however, is modified to the position of the virtual function's overriding baseclass within the effective-self (invariant).

The keyword constant preceding a virtual function definition is not currently supported.

Operator Overload

A function identifier that begins with the keyword operator followed by an operation (e.g., operator+), names an operator overload method, as in C (ulam-3).

<IDENT_OPERATOR_OVERLOAD> := 'operator' + <OVERLOAD_OP>

Operator overload methods are called automatically when the lefthand operand of a binary operator is a Class (the righthand operand becomes the argument); or when the operand of a unary operator is a Class (no argument). The Class, or its ancestor, must define the function and matching argument type; its return type depends on the function definition. These methods may also be called directly. In the case of comparison and equality operators, if the operator used has no defined overload method, ulam automatically complements the result of the inverse operation provided. In the case of Class assignments, the default C++ structure assignment is used when no matching overload function is defined. In the case of array item access, operator[] supercedes a custom array aref method. Note, the logical operators, && and ||, are not supported.

Initialization

When an object is declared, its init-declarator may specify an initial value for the identifier being declared. The initializer is preceded by =. Arrays of a primitive type may be initialized to a list of comma-delimited constant value expressions within a set of curly braces (ulam-2); arrays of an unspecified size is the same size as the number of items in the list; arrays of specified size with fewer initialization items propagates the last constant value to the remaining array items, unlike C.

Data members and class parameters of elements, quarks and transients may be initialized as part of their declaration. Named constants and model parameters must be initialized. Data members of unions cannot be initialized.

Only local variables defined within a function definition may be initialized with a variable expression. All other initializations must be constant expressions, including arrays initialized by a list of values (ulam-2). A constructor called on a class instance identifier may modify default class values when its a local function variable; constructors cannot be used with data members since constructors, like any function call, are not constant expressions (ulam-3).

<CONSTRUCTOR_CALL> := '(' + <ARGS> + ')'

Type Names

Typedef

Declarations whose storage class specifier is typedef do not declare objects; instead they define identifiers that name types. These identifiers are called typedef names. A typedef in another class may be referred to by using the select operator (dot). A typedef defined as a local in the file scope, may be redeclared in an inner scope.

Typedef Priority

Clarified in ulam-5, the priority of typedef use is predictable. For inheritance definitions and template class parameter types, the order of search is: class hierarchy, locals filescope, followed by global scope. There are no typedefs in global scope, only classes. Class names may be shadowed by typedef alias names. Typedefs may shadow another tyepdef, at every level. First one found will be used, unless specified with a baseclass type, self, or local (dot) prefix. The class hierarchy search starts in the current member class, then depth-first through the bases classes, in order of declaration on the class definition line. Typedefs preceded with the local. specifier, will skip the class hierarchy, and use the typedef, if any, in the locals filescope; otherwise, the global class will be used. Locals filescope has no class hierarchy of its own. Typedefs in locals filescope may be defined before or after the class definition, and are accessible to all the classes in the file. The parsing order of classes does not effect the result. For purposes other than inheritance definitions, the search begins with the current block where the typedef definition must precede its use, and subsequently each outer block until the class hierarchy, locals filescope, and finally the global scope is reached and searched.

Type Equivalence

Two type specifier lists are equivalent if they contain the same set of type specifiers, taking into account that some specifiers can be implied by others.

In ulam, a type is uniquely defined by its key. An ulam key type signature consists of: base type, bit size, array size, class instance index, and reference type.

An element, quark, union, or transient with different identifiers are distinct types. Anonymous classes are not visible to the ulam programmer. A class key has a fourth part to identify its instance class type. Reference types also use the fourth part of the key to identify the type it is referencing.

The base type of file scope locals is the name id of the ulam file location (ulam-3).

Statements

Except as described, statements are executed in sequence. Statements are executed for their effect, and do not have values. They fall into several groups.

Labeled Statements

The ulam has only internally labelled statements generated for control statements.

Expression Statement

Most statements are expression statements, and end in a ;.

<STATEMENT> := <SIMPLE_STATEMENT> | <CONTROL_STATEMENT> | <BLOCK>

Most expression statements are assignments or function calls. All side effects from the expression are completed before the next statement is executed. If the expression is missing, the construction is called a null statement; it is often used to supply an empty body to an iteration statement.

Block

So that several statements can be used where one is expected, the compound statement called a block is provided.

<BLOCK> := '{' + <STATEMENTS> + '}'

<STATEMENTS> := 0 | <STATEMENT> + <STATEMENTS>

The body of a function definition is a compound statement. As in C, if an identifier in the declaration-list was in scope outside the block, the outer declaration is suspended within the block, after which it resumes its force. An identifier may be declared only once in the same block.

Control Statement

Control statements choose one of several flows of control.

In both forms of the if statement, the condition expression, of type Bool, is evaluated. When it compares to true the first statement of its body is executed; otherwise, when in its second form, the optional else statement is executed.

<IF_STATEMENT> := 'if' + '(' + <CONDITIONAL_EXPRESSION> + ') + <STATEMENT> + <OPT_ELSE_STATEMENT>

<OPT_ELSE_STATEMENT> := 0 | 'else' + <STATEMENT>

The else ambiguity is resolved by connecting an else with the last encountered else-less if at the same block nesting level.

A kind of switch statement is supported in ulam-3. The keyword which is used in contrast to 'C'. The switch value is evaluated once before any cases are considered; an empty value is shorthand to select the first true case, and is required for the as conditional expression. Unlike C, cases are not constant expressions, and checking is performed in order with possible side-effects. Furthermore, the default case, otherwise, is considered last. There is no fall-through, however, multiple cases may share the same body, except for 'as' cases.

<SWITCH_STATEMENT> := 'which + '(' + <SWITCH_VALUE> + ')' + '{' + <CASE_EXPRESSIONS> + '}'

It is equivalent to a parse tree shaped like: { ( 0 | <VAR_DECL> + '=' + <SWITCH_VALUE> + ';' ) + ( 0 | <IFSW_STATEMENTS> ) + ( 0 | <DEFAULT_SW_STATEMENT> ) }

Iteration Statements

The iteration statements, while and for, specify looping. The do statement is not supported at this time.

In the while statement, the body is executed repeatedly so long as the value of its conditional expression remains true.

<WHILE_STATEMENT> := 'while' + '(' + <CONDITIONAL_EXPRESSION> + ')' + <STATEMENT>

With while, the test, including all side effects from the expression, occurs before each execution of the statement. It is equivalent to a parse tree shaped like:

while(true) { if(! <CONDITIONAL_EXPR) break; <STATEMENT> }

In the for statement, the first expression is evaluated once, and thus specifies initialization for the loop. There is no restriction on its type. The second expression is of type Bool; it is evaluated before each iteration, and if it becomes false, the 'for' is terminated. The third expression is evaluated after each iteration and thus specifies a re-initialization for the loop. There is no restriction on its type. Side effects from each expression are completed immediately after its evaluation.

<FOR_STATEMENT> := 'for' + '(' + ( 0 | <STATEMENT_DECL> | <ASSIGN_EXPRESSION>) + ';' + ( 0 | <CONDITIONAL_EXPRESSION>) + ';' + ( 0 | <ASSIGN_EXPRESSION>) + ')' + <STATEMENT>

If the body does not contain a continue statement, it is equivalent to a parse tree shaped like:

{ <STATEMENT_DECL> while ( <CONDITIONAL_EXPRESSION> ) { <STATEMENT> <ASSIGN_EXPRESSION> } }

Any of the three expressions may be dropped. A missing second expression makes the implied test equivalent to testing true.

Jump Statements

The following jump statements transfer control unconditionally.

The goto statement, is used internally only by the ulam compiler, and is unavailable to the ulam programmer.

A continue statement may appear only within an iteration statement. It causes control to pass to the loop-continuation portion of the smallest enclosing such statement.

<CONTINUE_STATEMENT> := 'continue' + ';'

A break statement may appear only in an iteration statement, and terminates execution of the smallest enclosing such statement; control passes to the statement following the terminated statement.

<BREAK_STATEMENT> := 'break' + ';'

The return statement is considered a simple statement.

Simple Statement

Simple statements end in a ;. Most statements are simple statements.

Decl Statement

A declaration allows the programmer to declare a list of variables with the same type, and to assign a value to any variable in the list. This declaration list applies to local variables within a function definition, but not their function parameters where each type must be explicit and default values are not supported; and likewise, to element, quark, and transient data members but not their class parameters.

Typedef Statement

A typedef statement defines an identifier that names a type. It does not introduce new types, only an alias for a type that could be specified in another way. Typedef names may refer to typedefs in another element or quark or transient using the dot operator; and, Typedef names may be redeclared in an inner scope.

<TYPE_DEF> := 'typedef' + <TYPE> + <TYPE_EXPRESSION>

<TYPE> := <TYPE_NAME> | <TYPE_NAME> + '(' + <CONSTANT_EXPRESSION> + ')'

<TYPE_NAME> := <TYPE_TOKEN> | <TYPE_IDENT> | <TYPE_IDENT> + ( '.' + <TYPE_IDENT>)*

<TYPE_EXPRESSION> := <TYPE_IDENT> | <TYPE_IDENT> + '[' + <CONSTANT_EXPRESSION> + ']'

Constant Statements

Named constants are preceded by the keyword constant. It announces its value will not be changed and occupies no space in the element or quark or transient or locals file scope, it belongs. A named constant may be a scalar or an array. Named constants may also be classes (ulam-4), or atoms (ulam-5).

<CONST_DEF> := 'constant' + <TYPE> + <IDENT> + '=' +

     `( <CONSTANT_EXPRESSION> | '{' ( 0 | <CONSTANT_EXPR_LIST> ) '}'`

     `| <CLASS_INIT> | '{' + ( 0 | <CLASS_ARRAY_INIT_LIST> ) + '}' )`

Named constants require an initialization value at declaration. An empty initialization in the case of arrays or a class, reverts to the default value of the class, or zero in the case of primitive type arrays. When the array size exceeds the number of initialized entries provided, the last one propagates.

Model Parameter Statements

Model Parameters are preceded by the keyword parameter and include a default constant value. Unlike a named constant, its value can be changed, though only through the MFM Simulator interface. Its value is shared among all instances of the same ulam class type. It occupies no space in the element class it belongs. Type must be a scalar primitive type, excluding Void (and Atom).

<PARAMETER_DEF> := 'parameter' + <TYPE> + <IDENT> + '=' + <CONSTANT_EXPRESSION>

Return Statement

A function returns to its caller by the return statement. When return is followed by an expression, the value is returned to the caller of the function. The expression is converted, as if by assignment, to the type returned by the function in which it appears. Return values may be safely cast to match the return type. In a function returning Void, the return statement with expression can be used, if the expression type is Void, or explicitly cast to Void. Flowing off the end of a function is equivalent to a return with no expression.

<RETURN_STATEMENT> := 'return' + ( 0 | <ASSIGN_EXPRESSION>)

Definitions

A program definition is a translational unit of input provided to the ulam compiler.

<PROGRAM_DEF> := <QUARK_OR_UNION_DEF> | <ELEMENT_DEF> | <TRANSIENT_DEF>

Each element, quark or union, and transient, consists of a sequence of data member declarations, typedefs, named constant definitions, model parameters (elements only), or function definitions, collectively called class members:

The scope of these declarations persists to the end of the translational unit in which they are declared, just as the effect of declarations within blocks persists to the end of the block. A class block may be empty.

<CLASS_BLOCK> := '{' + <CLASS_MEMBERS> + '}'

Only at this level may the code for functions be given.

Classes optionally may be defined as template with one or more class parameters within parentheses that are akin to named constants that take no space. Template parameter types are limited to 32-bit primitive scalars and arrays with no more than 16 items, as well as scalar Strings, and constant Classes (ulam-4). A class parameter may be given a default value, however, subsequent parameters must then also have defaults to avoid ambiguity during instantiation. Default values, in the order searched, can be: sibling parameters, class hierarchy member constants, or locals filescope constants. Template parameter values are encoded in instance class names and thus file names, so the 256 byte Linux filename limit can really ruin your day --- use constant arrays and strings in moderation.

<CLASS_PARAM_DECLS> := <CLASS_PARAM> | <CLASS_PARAM> + ',' + <CLASS_PARAM_DECLS>

<CLASS_PARAM> := <TYPE> + <VAR_DECL> + ( 0 | '=' + <CONSTANT_EXPRESSION> )

Each instance of a template class with unique (constant expression) values for its class arguments defines a unique type; arguments with default values may be omitted.

<CLASS_ARGS> := 0 | <CLASS_ARG> | <CLASS_ARG> + ',' <CLASS_ARGS>

<CLASS_ARG> := <CONSTANT_EXPRESSION>

Furthermore, element, quarks and unions optionally may be defined as a subclass of a quark or union instance; transients as a subclass of a transient or quark or union instance (ulam-2). The super class is preceded by :. Optional sibling baseclasses are preceded by + in ulam-5, where the declaration order determines the order members are found. Inherited functions may be overridden by a subclass, or related baseclasses (ulam-5); data members cannot be shadowed by a subclass, but may be shadowed across sibling baseclasses (ulam-5). By default, member searches are depth-first, each complete baseclass hierarchy, in turn; the exception, virtual function overrides are matched in breadth-first order across the declared baseclasses, followed by their ancestors, such that the first most-specific class wins (a class that's a subclass of another class is more specific). Prior to ulam-5, the space of the ancestor was included at the beginning of the subclass' allotted space; however, with multiple baseclasses, a subclass' data members come first, followed by each direct ancestor in order of declaration (breadth-first); and lastly, by the shared bases of the direct ancestors. The relative position of a baseclass in a subclass, though known at compile-time, is sometimes necessarily queried at runtime (e.g., virtual functions, class references), using its Registry Number (also known as classId (ulam-5)), assigned at compile-time (ulam-4). For an at-a-glance reference, a class' components are charted with start position, bitsize, name, and type at the close of its generated header (.h) file.

Class Definitions

In ulam, there are four class types: elements, quarks, unions, and transients. Classes that do not have a specified superclass inherit from the zero-sized UrSelf quark (ulam-2). As of ulam-5, a class may inherit from multiple bases; and, all of them are shared, thus avoiding the diamond inheritance problem. Direct bases are those that appear with the class definition preceded by +, including the superclass. Only one class is the super class; the other bases may be referred to specifically by ancestor type, or baseclass and classId when calling virtual functions (see Member Select Expression).

An element is an object consisting of a sequence of named members of various types, including quarks and unions, excluding elements and transients. An element must fit into the storage allotted to an atom less any space reserved for type and error correction. An element may inherit from one, or more (ulam-5), quark or union instances; ancestor sizes contribute to the overall size of the element.

<ELEMENT_DEF> := 'element' + <ELEMENT_IDENT> + ( 0 | <ELEMENT_SUPERCLASS> ) + ( 0 | <ELEMENT_BASECLASSES> ) + <CLASS_BLOCK>

<ELEMENT_SUPERCLASS> := ':' + <ELEMENT_ANCESTOR>

<ELEMENT_BASECLASSES> := <ELEMENT_BASECLASS> | ( <ELEMENT_BASECLASS> + <ELEMENT_BASECLASSES> )

<ELEMENT_BASECLASS> := ( 0 | '+' + <ELEMENT_ANCESTOR> )

<ELEMENT_ANCESTOR> := <QUARK_OR_UNION_TYPE>

<ELEMENT_IDENT> := <TYPE_IDENT> + ( 0 | '(' + <CLASS_PARAM_DECLS>) + ')' )

<ELEMENT_TYPE> := <TYPE_IDENT> + ( 0 | '(' + <CLASS_ARGS>) + ')' )

A quark is an object consisting of a sequence of named members of various types, including other quarks (not an instance of itself) and unions, but not an element nor a transient, and may occupy no more than 64 bits and as little as zero. A quark may inherit from one, or more (ulam-5), quark instances; ancestor sizes contribute to the overall size of the quark.

<QUARK_DEF> := 'quark' + <QUARK_OR_UNION_IDENT> + ( 0 | <QUARK_OR_UNION_SUPERCLASS> ) + ( 0 | <QUARK_OR_UNION_BASECLASSES> ) + <CLASS_BLOCK>

<QUARK_OR_UNION_SUPERCLASS> := ':' + <QUARK_OR_UNION_ANCESTOR>

<QUARK_OR_UNION_BASECLASSES> := <QUARK_OR_UNION_BASECLASS> | ( <QUARK_OR_UNION_BASECLASS> + <QUARK_OR_UNION_BASECLASSES> )

<QUARK_OR_UNION_BASECLASS> := ( 0 | '+' + <QUARK_OR_UNION_ANCESTOR> )

<QUARK_OR_UNION_ANCESTOR> := <QUARK_OR_UNION_TYPE>

<QUARK_OR_UNION_TYPE> := <TYPE_IDENT_DESC> + ( 0 | '(' + <CLASS_ARGS> + ')' )

<QUARK_OR_UNION_IDENT> := <TYPE_IDENT> + ( 0 | '(' + <CLASS_PARAM_DECLS> + ')' )

A union is a quark that contains, at different times, any one of several members of various types. Its non-zero (until ulam-5) size is that of the maximum sized data member,excluding any ancestors, where all members begin at the same offset, 0. Unlike quarks, its data members cannot be initialized, including Strings. A union, same as quarks, may be a baseclass and may inherit from other quarks or unions (ulam-5). Unions could be useful to "cast" between arrays of the same total bitsize but different ulam types (scalar type and arraysize).

<UNION_DEF> := 'union' + <QUARK_OR_UNION_IDENT> + ( 0 | <QUARK_OR_UNION_SUPERCLASS> ) + ( 0 | <QUARK_OR_UNION_BASECLASSES> ) + <CLASS_BLOCK>

<QUARK_OR_UNION_DEF> := <QUARK_DEF> | <UNION_DEF>

A transient is a temporary local class object consisting of a sequence of named members of various types, including other transients (not an instance of itself), as well as elements, quarks, unions, and atoms, and may occupy no more than 8K (8192) bits (including arrays) and as little as zero (ulam-2). A transient may inherit from one, or more (ulam-5), transient, quark or union instances; ancestor sizes contribute to the overall size of the transient.

<TRANSIENT_DEF> := 'transient + <TRANSIENT_IDENT> + ( 0 | <TRANSIENT_SUPERCLASS> ) + ( 0 | <TRANSIENT_BASECLASSES> ) + <CLASS_BLOCK>

<TRANSIENT_SUPERCLASS> := ':' + <TRANSIENT_ANCESTOR>

<TRANSIENT_BASECLASSES> := <TRANSIENT_BASECLASS> | ( <TRANSIENT_BASECLASS> + <TRANSIENT_BASECLASSES> )

<TRANSIENT_BASECLASS> := ( 0 | '+' + <TRANSIENT_ANCESTOR> )

<TRANSIENT_IDENT> := <TYPE_IDENT> + ( 0 | '(' + <CLASS_PARAM_DECLS> + ')' )

<TRANSIENT_TYPE> := <TYPE_IDENT_DESC> + ( 0 | '(' + <CLASS_ARGS> + ')' )

<TRANSIENT_ANCESTOR> := ( <TRANSIENT_TYPE> | <QUARK_OR_UNION_TYPE> )

The keyword @Concrete preceding a class definition will insure its implementation has no pure virtual functions at compile time (ulam-6). Without @Concrete only regular elements are tested for abstract errors before runtime. For regular classes, a use is not required; and, the programmer expects all virtual functions to be defined without further subclassing. A Template class passes along a @Concrete expectation to its specific instances; So, although a use (e.g. variable, baseclass, typedef) is necessary, the benefit of a compile-time check for any class type, to avoid a runtime failure, is available (ulam-6).

Function Definitions

Function definitions have the form:

<FUNC_DEF> := <ULAM_FUNC_DEF> | <NATIVE_FUNC_DEF> | <VIRTUAL_FUNC_DEF> | <CLASS_CONSTRUCTOR_DEF>

<ULAM_FUNC_DEF> := <FUNC_DECL> + <BLOCK>

<NATIVE_FUNC_DEF> := <FUNC_DECL> + 'native' + ';'

<VIRTUAL_FUNC_DEF> := 'virtual' + ( <ULAM_FUNC_DEF> | <NATIVE_FUNC_DEF> | ';')

<CLASS_CONSTRUCTOR_DEF> := 'Self' + '(' + <FUNC_PARAMS> + ')' + ( <BLOCK> | 'native' + ';' )

where,

<FUNC_DECL> := <TYPE> + <FUNC_IDENT> + '(' + <FUNC_PARAM_DECLS> + ')'

specifies the return type, its name, and optional function parameters:

<FUNC_IDENT> := <IDENT> | <IDENT_OPERATOR_OVERLOAD>

<FUNC_PARAM_DECLS> := 0 | '...' | <FUNC_PARAMS> | <FUNC_PARAMS> + ',' + '...'

<FUNC_PARAMS> := <FUNC_PARAM> | <FUNC_PARAM> + ',' + <FUNC_PARAMS>

<FUNC_PARAM> := ( 0 | 'constant') + <TYPE> + <VAR_DECL>

A function may return an atom, element, quark or union, transient, or a primitive type (including Void); a reference (ulam-3); not a function. A constant modifier preceding a function parameter announces that its value will not be modified by the function (ulam-4). Constant functions are not currently supported: the lefthandside of a function call must be modifiable.

Parameters, essentially variables, are understood to be declared just after the beginning of the compound statement constituting the function's body, and thus the same identifiers must not be redeclared there (although they may, like other identifiers, be redeclared in inner blocks). Unlike in C, if the parameter is declared to have type "array of type", the declaration is not adjusted to read "pointer to type"; it is passed by value as with C-style structures. During the call to a function, the arguments are assigned, in order, to the parameters, and safely converted as necessary.

Functions with the same name may be overloaded to take different type parameters, and optionally return a different type. Like in C, native function definitions can specify a variable number of parameters using ellipsis ... notation. Further discussion regarding these unspecified parameter types in ulam can be found here.

Class Constructors share the same name, Self, require at least one parameter, and may be overloaded. There is no explicit return type; however, return statements are optional. (ulam-3)

Scope and Linkage

Unlike C, an ulam library needs to be compiled all at one time. ulam encourages source text be kept in a separate .ulam file for each translational unit, and that the name of the file corresponds with the name of an element or quark or transient defined within. Helper classes that reside within the same file as the named class have access to 'local' file scope constants (ulam-3).

Communication among the functions of a program may be carried out both through calls and through manipulation of referenced data.

Therefore, there are two kinds of scope to consider: first, the lexical scope of an identifier, which is the region of the program text within which the identifier's characteristics are understood; and second, the scope associated with objects and functions with external linkage, which determines the connections between identifiers in separately compiled translation units.

Lexical Scope

Unlike C, the same identifier may not be used for different purposes, within the same scope. Types and object names are lexically distinct in ulam by the use of capitalization of types. Like C, the lexical scope of an object or function identifier begins at the end of its declarator and persists to the end of the translation unit in which it appears. The scope of a parameter of a function definition begins at the start of the block defining the function, and persists through the function; the scope of a parameter in a function declaration ends at the end of the declarator. The scope of an identifier declared at the head of a block beings at the end of its declarator, and persists to the end of the block. Labels are not used in ulam. The scope of a quark or union or named constant, begins at its appearance in a type specifier, and persists to the end of the translation unit, or to the end of the block (for declarations within a function).

If an identifier is explicitly declared at the head of a block, including the block constituting a function, any declaration of the identifier outside the block is suspended until the end of the block. This is known as shadowing.

File scope constants and typedefs, declared outside of a class, with the keyword local, are available to all the classes defined in the same ulam file, and may be redeclared in an inner scope. A local def may be explicitly specified with local. before its name.

Data members of elements and quarks and transients, and file scope locals, may be declared after they are used.

Class parameters of elements and quarks and transients, are essentially named constants, and may be used by their other parameters, including ancestors, in any order. Filescope and member constants and typedefs may be used by class parameters in their types and default values.

Linkage

Within a translation unit, all declarations of the same object or function identifier with internal? linkage refer to the same thing, and the object or function is unique to that translation unit. All declaration for the same object or function identifier with external linkage refer to the same thing, and the object or function is shared by the entire library.

Ulam generates the static specifier for element and quark and transient non-virtual function definitions. The ulam programmer does not have access to the extern or static specifiers.

Preprocessing

ulam supports three preprocessing directives: use, load, and ulam.

The ulam directive specifies the version of the language. The source ulam version may not exceed the current compiler version.

File Inclusion

A control line of the form

use Filename;

causes the file, Filename.ulam, to be queued for parsing (ulam-6); previously, it behaved like a load. The named file is searched for in a sequence of implementation-dependent places. May be useful when multiple classes are defined in a single ulam file, since only one class name matches the file name.

Similarly, a control line of the form

load Filename;

causes the replacement of that line by the entire contents of the file, Filename.ulam, regardless if it has been seen before. Note, Filename.ulam will also be compiled separately.

load "NonUlamFilename";

Alternatively, a double-quoted filename may be used when the suffix of the file is not .ulam to avoid a separate compilation (ulam-3). The NonUlamFilename string may include its path and suffix as necessary to fully identify the file. For local defs in the loaded file, the file where the load appears is its filescope (ulam-6); and, it may be safely loaded into multiple ulam files.

Line Splicing

Macro Definition and Expansion

Conditional Compilation

Line Control

Error Generation

Pragmas

Null directive

Predefined Names

Currently not supported in ulam, however, they may apply to native ulam functions.

Ulam Wiki Home
Ulam Programming Tutorial
Ulam Standard Library
Secrets of Ulam: Tips & Tricks
Notes for ulam and MFM developers
Appendices:
A. Reference Manual
B. Grammar
C. Mangled-Names
D. Leximited-Format
E. Ulam-Release-Notes