Skip to content
tiluser edited this page Oct 21, 2012 · 4 revisions
<TITLE>Creole : A Forth-like scripting language in Borland Delphi</TITLE>

Creole Forth : A scripting language in Borland Delphi

Abstract

Creole Forth is a simple Forth-like programming language developed as a component in Borland Delphi. It is similar to Norman Smith's UNTIL in that it was designed as a scripting language that can be tailored to a specific application. It has several unusual features, which include string-based processing, the ability to place and process lists on the data stack (similar to Joy), and the ability to filter and transform data based on rules defined by the programmer. This presentation will discuss its internal structure and give some simple examples of how it can be used as a scripting language embedded in an application.

Introduction

Delphi and its cousin C++ Builder offer an Integrated Development Environment (IDE) with a palette of object-oriented components. Applications can be developed by dropping components onto a form and writing a minimum amount of code to set their properties and event handlers. The programmer can also define new components in the Object Pascal language using inheritance. He can thus extend the development environment and make it more powerful.

All components in Delphi are descendants of the TComponent class in its class hierarchy and share the following characteristics :

  • The ability to appear on the Component palette and be manipulated in the form designer.
  • The ability to own and manage other components.
  • Enhanced streaming and filing capabilities.
  • The ability to be converted into an ActiveX control or other COM object by wizards on the ActiveX page of the New Objects dialog.

Even though Delphi's assemblage of components is called the VCL (Visual Component Library), components need not be visual in nature. If they are not, they only appear on a form at design-time and are invisible at runtime. TCreole is a direct descendant of TComponent and is thus a non-visual component. It contains inside of it the code to compile and execute the Creole scripting language. These include words defined in a dictionary, an outer and inner interpreter, and a colon compiler. It also has a set of interfaces that are implemented as properties which include a set of 5 stacks, a field to submit code, and a field to gather output. Code is executed by placing it in the input field and executing a submit method.

Theory of operation

Data structures

Creole Forth has the following globally accessible data structures :
  1. Data or Parameter Stack
  2. Return Stack
  3. Vocabulary Stack
  4. Prefilter Stack
  5. Postfilter Stack
  6. Dictionary
  7. Input
  8. Output
  9. PAD

These all appear as properties on its property sheet.

All of the above structures are TStrings/TStringList objects, although the return stack values are converted internally to integer values in order to preserve speed. TStrings and TStringLists are container objects that are capable of holding any form of object, not just strings.

Words

Individual words are defined as objects of type TCreoleWord with the following fields :

  1. NameField :string type. The given name of a word..
  2. IndexField : integer type. This is a word's index or address in the dictionary.
  3. CodeField : PrimProc type. PrimProc is a procedural type (essentially a pointer to a procedure) for a Creole primitive.
  4. ParameterArray : A simple array of integers.
  5. ParameterField : String-based version of the ParameterArray.
  6. DataField : string type. Used to store data values; either string or numeric.
  7. TypeField : string type. Stores type of word. Currently the types are primitive, high-level (colon definition), and defining.
  8. RedefinedField : integer type. Stores address of previously defined word with same name.
  9. LinkField : string type. One less than the IndexField.
  10. HelpField : string type. Help entry for word.
  11. Vocabulary : string type. Vocabulary word is defined in. Currently the existing vocabularies are ONLY, FORTH, IMMEDIATE, LIST, PREFILTER, and POSTFILTER.
Other fields may easily be defined as needed.

Outer interpreter.
Whenever code is submitted to the input field via the submit method, the following steps occur :

  1. The code is passed to the DoOuter procedure.

  2. A prefiltering procedure is invoked, which preprocesses the input. Prefilters are words in the PREFILTER vocabulary. They can be as simple as a routine that strips comments out of the code before interpretation, or be as complex as an entire programming language that outputs Forth code to be passed on to the outer interpreter for execution.

  3. DoOuter then processes the output one word at a time, looking up each one in the dictionary. It searches for the word looking at every vocabulary in the vocabulary stack.

  4. If the word is found, it's passed to DoInner which executes it. A primitive is simply executed; a colon definition would have the contents of its parameter field iterated through and executed.

  5. If it isn't found, it's passed to the designated post-filtering routine (the routine or routines on the POSTFILTER stack. For a 'classic' Forth this routine would attempt to convert the data to an integer. and place it on the stack. In Creole the post-filter rule can be changed to accept string values onto the stack without any attempt at conversion. It could also be changed to convert the data to other types before placing on the stack.

Inner interpreter and the dictionary.

When execution is passed onto the inner interpreter, primitive definitions are executed immediately. These are procedures defined in Object Pascal that all have the following parameters passed to them :

  1. Owner : This is the component owner. Since Delphi components are designed to be dropped onto a 'parent' form, this parameter is needed in order for primitives to create and invoke them.

  2. External Interface. Contains the interfaces to the stacks, the outer interpreter pointer, and the non-dictionary data structures of Creole.

  3. Dictionary Interface. Contains the interface the dictionary and an exact copy of the dictionary itself.

  4. Creole Word interface. Allows the primitives to access the data and methods of any Creole word.

  5. Rules interface. Allows the setup of startup rules as a Windows ini file. As of this writing, this interface has not been coded.

  6. High-level definitions are definitions created by the colon compiler. These are definitions with their parameter fields filled with the array addresses of words previously defined in the dictionary. When the colon compiler creates a definition, it attaches the DoColon procedure to each word object so that it will know how to handle the contents of its parameter field.

    Dictionary layout

    The dictionary is a TStringList of TCreoleWord objects with an associated encrypted name which serves as a key. The outer interpreter uses the encrypted name to find the associated TCreoleWord object. Each word is encrypted differently based on what vocabulary it's in. Because of this setup, there's no need for a smudge flag to hide dictionary definitions during compilation.

    When compilation is done, a state flag is not used. Instead the colon compiler places the IMMEDIATE vocabulary on top of the vocabulary stack. It then searches this vocabulary first and if a word is found there, executes it. If the word is outside the IMMEDIATE vocabulary, it is simply compiled. This setup is similar to Chuck Moore's cmForth. Compilation is ended when the ; word (SEMI) executes and pops the IMMEDIATE vocabulary off of the vocabulary stack.

    Because of the way the dictionary is set up, a smudge flag is not needed to hide a word being defined. Instead, the encryption process can be used to take care of that.

    Some Examples

    Example 1. A simple Web Server In this application, a TIDHttpServer component is used to submit commands to Creole. A home page is shown with three examples :

    • A button that generates the current date and time.
    • An RPN calculator.
    • A button that generates a listing of all the words in the dictionary and their associated information.

    All of the web pages are generated by commands that are submitted to the Creole web server.

    Example 2. Interfacing with an external scripting language.> In this example, a dialog box is used to select a Perl file and create an alias to it that is invoked in a manner identical to a Creole or Forth definition. Parameters are passed to and returned from the data stack, just as in ordinary definitions.

    Example 3. Windows DLL Interfacing. For this application, the DLL needs to be imported into an application and referenced in a wrapper primitive. Two commands have been set up in this case : SM and LISTWIN. SM calls the Sendkeys.dll to write a some text out to Notepad. LISTWIN enumerates all the active windows and their associated handles.

    Other features of Creole Forth

    • String-based

      Unlike most Forths, which are byte and integer-base, Creole Forth is string-based. Everything by default is a string, with the exception of the return stack and the parameter array in a high-level definition which holds indexes. This makes it possible to place strings and lists on the data stack and inside of variables, not just integers. The advantage of flexibility comes at some price in speed, since conversion must be done for mathematical operations. If necessary, speed can be enhanced for cycle-intensive operations by (A) writing the code as a primitive and (B) doing more internal conversion of datatypes within the language where applicable.


    • Five plus stacks

      In most Forths, by default there are two stacks : an integer stack and a return stack, which are both integer-based. In some Forths a vocabulary stack is defined which defines the dictionary search order. In addition to these, Creole Forth has two more stacks by default : a prefilter stack and a postfilter stack. Before the outer interpeter processes the input, it passes it to the routines listed on the prefilter stack, which can modify it to any degree desired. Prefilter routines are simply primitives in the dictionary in the 'PREFILTER' vocabulary. If the outer interpreter cannot find a word in the dictionary, it passes it on to the routines listed in the postfilter stack, which are words in the dictionary in the 'POSTFILTER' vocabulary. The usual actions of a postfilter word is to check if the data matches specifications designated by the programmer, transform it as needed, and submit it to the data stack (it can also send the data to another data structure). For example, the INTEGER filter only allows integers on the stack, and empties the stack if non-integer data is submitted to it.


    • Writing new primitives

      This is done by writing a new procedure and referencing it by name using the BuildPrimitive method. All primitives must have the same parameter list.


    • Defining words

      Creole Forth allows the programmer to create new data types with the conventional CREATE-DOES> combination. DOES> works by copying the code following it into the parameter field of the child word when that word is defined. When that child word is invoked, the inner interpreter handles it like a colon definition with one exception : it also pushes its unique ID value on the data stack. This allows it to interface with data storage words such as , (comma) transparently. Data is stored in a separate field from code.


    • Compiling words

      These are mostly branching primitives such as IF, THEN, ELSE, DO, LOOP, BEGIN, and UNTIL, plus words that compile numeric and string literals into the dictionary. They are handled by creating one procedure for compile time and one for run-time. The compile time procedure is flagged as immediate to execute inside a colon definition so it can compile the token for the run-time code into it and perform any necessary additional processing (such as pulling a branch address off the stack).

      The programmer can use the POSTPONE primitive to create new compiling words. As in more traditional Forths, POSTPONE either defers compilation of a non-immediate word or forces a compile of an immediate word.


    Summary, discussion, and possible future directions

    Creole Forth was designed as a simple but extensible scripting language that can be dropped into a Delphi or C++ Builder application and tailored to it. The language can be extended either by writing high-level definitions or defining new words as primitives. The availability of pre- and post-filtering mechanisms adds an extra dimension of flexibility.

    Despite the relatively small size of the current  implementation ( about 2700 lines of Object Pascal code and less than 100 primitives), it was surprisingly easy to develop powerful constructs generally associated with more complete Forths. The separation of code and data into different spaces made the implementation of defining words almost trivial. The relative simplicity of Creole Forth and the ease of access to internal data structures may make it useful for teaching the fundamental concepts of Forth.

    It is a string-based language, which affords a great deal of flexibility at some cost in speed, since integers are an 'underprivileged' data type (the opposite of most Forths). However, resource intensive routines can be coded in Object Pascal and referenced as primitives. They could also be written in another language such as C, C++, Perl or another Forth. DLL or COM objects could be used for the interfacing, but aren't absolutely necessary.

    A more complete ANSI-style Forth with 200-300 primitives would probably be doable. A large number of words were left out in this implementation simply because they weren't needed. Examples of these are >BODY and C, because there are no in-memory addresses to worry about. Editor and assembler vocabularies were also not included.

    Another possibility would be a stripped-down version of the Creole system. Norman Smith takes this approach with his UNTIL applications. He has a small version of it called Calc and a much larger implementation called S-ENGINE which is an almost 100% complete version of Forth. Since Calc is less than 1/4 the size of S-Engine it is obviously much simpler for the average programmer to understand, modify, and implement.

    The relatively small size of the source code makes it feasible to re-work it in other languages. Java would be an obvious candidate since it has a component-oriented architecture similar to Delphi.

    Creole Forth is currently still under development, but the Object Pascal source code is available under a BSD-style public license. Please email [email protected] or [email protected] for more information.

    References

    • Anderson, Anita and Tracy, Martin; Mastering Forth, Brady Books, 1989
    • Brodie, Leo; Starting Forth, Prentice-Hall, 1984.
    • Brodie, Leo; Thinking Forth, Prentice-Hall, 1984.
    • Engo, Frank; How to Program Delphi 3; Ziff-Davis Press, 1997.
    • Konopka, Ray; Developing Custom Data Components; Coriolis Group Books, 1996.
    • Mueller, John and Norton, Peter; Guide to Delphi 2, SAMS Publishing, 1996.
    • Ousterhout, John; Scripting: Higher Level Programming for the 21st Century; IEEE Computer Magazine, March 1998.
    • Smith, Norman; Write Your Own Programming Language Using C++ (2nd edition), Wordware Publishing, 1996.
    • Von Thun, Manfred; http://www.latrobe.edu.au/philosophy/phimvt/joy.html
Clone this wiki locally