Defining Words

The Pearl of Forth!

virginia.edu

Copied from: http://galileo.phys.virginia.edu/classes/551.jvn.fall01/primer.htm#create

Michael Ham has called the word pair CREATE…DOES>, the “pearl of Forth”.

  • CREATE is a component of the compiler, whose function is to make a new dictionary entry with a given name.

  • DOES> specifies a run-time action for the “child” words of a defining word.

Note

The point of a create does> word is that when you execute it, it produces a child word, and when that child word executes, it executes some code.

Defining “Defining” Words

CREATE finds its most important use in extending the powerful class of Forth words called “defining” words. The colon compiler “:” is such a word, as are VARIABLE and CONSTANT.

The definition of VARIABLE in high-level Forth is simple

: VARIABLE  CREATE   1 CELLS  ALLOT ;

We have already seen how VARIABLE is used in a program. An alternative definition found in some Forths is the variables are initialized to 0.

: VARIABLE  CREATE   0  ,  ;

Forth lets us define words initialized to contain specific values: for example, we might want to define the number 17 to be a word. CREATE and “,” (“comma”) can do this:

17 CREATE SEVENTEEN  ,  <cr>  ok

Now test it via

SEVENTEEN @ .  <cr>  17 ok .

Remarks:

  • The word , (“comma”) puts TOS into the next cell of the dictionary and increments the dictionary pointer by that number of bytes.

  • A word “C,” (“see-comma”) exists also — it puts a character into the next character-length slot of the dictionary and increments the pointer by 1 such slot. (In the ASCII character representation the slots are 1 byte long; Unicode characters require 2 bytes.)

Run-time vs. compile-time actions

In the preceding example, we were able to initialize the variable SEVENTEEN to 17 when we CREATEd it, but we still have to fetch it to the stack via SEVENTEEN @ whenever we want it. This is not quite what we had in mind. We would like to find 17 in TOS when SEVENTEEN is named. The word DOES> gives us the tool to do this.

The function of DOES> is to specify a run-time action for the “child” words of a defining word. Consider the defining word CONSTANT , defined in high-level (of course CONSTANT is usually defined in machine code for speed) Forth by

: CONSTANT  CREATE  ,  DOES>  @  ;

and used as

53 CONSTANT PRIME  <cr> ok

Now test it:

PRIME . <cr>  53  ok .

What is happening here?

  • CREATE (hidden in CONSTANT) makes an entry named PRIME (the first word in the input stream following CONSTANT). Then “,” places the TOS (the number 53) in the next cell of the dictionary.

  • Then DOES> (inside CONSTANT) appends the actions of all words between it and “;” (the end of the definition) —in this case, “@”— to the child word(s) defined by CONSTANT.

Dimensioned data (intrinsic units)

Here is an example of the power of defining words and of the distinction between compile-time and run-time behaviors.

Physical problems generally involve quantities that have dimensions, usually expressed as mass (M), length (L) and time (T) or products of powers of these. Sometimes there is more than one system of units in common use to describe the same phenomena.

For example, U.S. or English police reporting accidents might use inches, feet and yards; while Continental police would use centimeters and meters. Rather than write different versions of an accident analysis program it is simpler to write one program and make unit conversions part of the grammar. This is easy in Forth.

The simplest method is to keep all internal lengths in millimeters, say, and convert as follows:

: INCHES  254   10  */ ;
: FEET   [ 254 12 * ] LITERAL  10  */ ;
: YARDS  [ 254 36 * ] LITERAL  10  */ ;
: CENTIMETERS   10  * ;
: METERS   1000  * ;

Note

This example is based on integer arithmetic. The word */ means “multiply the third number on the stack by NOS, keeping double precision, and divide by TOS”. That is, the stack comment for */ is ( a b c – a*b/c).

The usage would be

10 FEET  .  <cr>  3048 ok

The word “[” switches from compile mode to interpret mode while compiling. (If the system is interpreting it changes nothing.) The word “]” switches from interpret to compile mode.

Barring some error-checking, the “definition” of the colon compiler “:” is just

:  :   CREATE  ]  DOES>  doLIST  ;

and that of “;” is just

:  ;   next  [  ;  IMMEDIATE

Another use for these switches is to perform arithmetic at compile time rather than at run-time, both for program clarity and for easy modification, as we did in the first try at dimensioned data (that is, phrases such as

[ 254 12 * ] LITERAL

and

[ 254 36 * ] LITERAL

which allowed us to incorporate in a clear manner the number of tenths of millimeters in a foot or a yard.

The preceding method of dealing with units required unnecessarily many definitions and generated unnecessary code. A more compact approach uses a defining word, UNITS :

: D,  ( hi lo --)   SWAP  , ,  ;
: D@  ( adr -- hi lo)   DUP  @   SWAP   CELL+  @   ;
: UNITS  CREATE  D,   DOES> D@  */ ;

Then we could make the table

254 10        UNITS INCHES
254 12 *  10  UNITS FEET
254 36 *  10  UNITS YARDS
10  1         UNITS CENTIMETERS
1000  1       UNITS METERS

\ Usage:
10 FEET  . <cr>  3048  ok
3 METERS . <cr>  3000  ok
\ .......................
\ etc.

This is an improvement, but Forth permits a simple extension that allows conversion back to the input units, for use in output:

VARIABLE  <AS>    0 <AS> !
: AS     TRUE  <AS> ! ;
: ~AS    FALSE <AS> ! ;
: UNITS  CREATE  D,  DOES>  D@  <AS> @
         IF  SWAP  THEN
         */    ~AS  ;

\ UNIT DEFINITIONS REMAIN THE SAME.
\ Usage:
10 FEET  .   <cr>  3048  ok
3048 AS FEET  .  <cr>  10  ok

Advanced uses of the compiler

Suppose we have a series of push-buttons numbered 0-3, and a word WHAT to read them. That is, WHAT waits for input from a keypad: when button #3 is pushed, for example, WHAT leaves 3 on the stack.

We would like to define a word BUTTON to perform the action of pushing the n’th button, so we could just say:

WHAT BUTTON

In a conventional language BUTTON would look something like

: BUTTON  DUP  0 =  IF  RING  DROP  EXIT  THEN
          DUP  1 =  IF  OPEN  DROP  EXIT  THEN
          DUP  2 =  IF  LAUGH DROP  EXIT  THEN
          DUP  3 =  IF  CRY   DROP  EXIT  THEN
          ABORT" WRONG BUTTON!"   ;

That is, we would have to go through two decisions on the average.

Forth makes possible a much neater algorithm, involving a “jump table”. The mechanism by which Forth executes a subroutine is to feed its “execution token” (often an address, but not necessarily) to the word EXECUTE. If we have a table of execution tokens we need only look up the one corresponding to an index (offset into the table) fetch it to the stack and say EXECUTE.

One way to code this is

CREATE  BUTTONS  ' RING ,  ' OPEN ,  ' LAUGH ,  ' CRY ,

: BUTTON   ( nth --)    0 MAX  3 MIN
        CELLS  BUTTONS  +  @  EXECUTE  ;

Note how the phrase “0 MAX 3 MIN” protects against an out-of-range index. Although the Forth philosophy is not to slow the code with unnecessary error checking (because words are checked as they are defined), when programming a user interface some form of error handling is vital. It is usually easier to prevent errors as we just did, than to provide for recovery after they are made.

How does the action-table method work?

  • CREATE BUTTONS makes a dictionary entry BUTTONS.

  • The word ‘ (“tick”) finds the execution token (xt) of the following word, and the word , (“comma”) stores it in the data field of the new word BUTTONS. This is repeated until all the subroutines we want to select among have their xt’s stored in the table.

  • The table BUTTONS now contains xt’s of the various actions of BUTTON.

  • CELLS then multiplies the index by the appropriate number of bytes per cell, to get the offset into the table BUTTONS of the desired xt.

  • BUTTONS + then adds the base address of BUTTONS to get the abso- lute address where the xt is stored.

  • @ fetches the xt for EXECUTE to execute.

  • EXECUTE then executes the word corresponding to the button pushed.

Simple!

If a program needs but one action table the preceding method suffices. However, more complex programs may require many such. In that case it may pay to set up a system for defining action tables, including both error-preventing code and the code that executes the proper choice. One way to code this is

: ;CASE   ;                     \ do-nothing word

: CASE:
    CREATE  HERE  -1  >R   0  ,   \ place for length
    BEGIN   BL  WORD  FIND        \ get next subroutine
       0=  IF   CR  COUNT  TYPE  ."  not found"  ABORT  THEN
       R>  1+  >R
       DUP  ,    ['] ;CASE  =
    UNTIL   R>   1-  SWAP  !      \ store length
    DOES>   DUP  @   ROT          ( -- base_adr len n)
            MIN  0  MAX           \ truncate index
            CELLS  +  CELL+  @  EXECUTE  ;

Note the two forms of error checking. At compile-time, CASE: aborts compilation of the new word if we ask it to point to an undefined subroutine:

case: test1   DUP  SWAP  X  ;case
X not found

and we count how many subroutines are in the table (including the do-nothing one, ;case) so that we can force the index to lie in the range [0,n].

CASE:  TEST  *  /  +  -  ;CASE  ok
15 3 0 TEST . 45  ok
15 3 1 TEST . 5  ok
15 3 2 TEST . 18  ok
15 3 3 TEST . 12  ok
15 3 4 TEST . . 3 15  ok

Just for a change of pace, here is another way to do it:

 : jtab:  ( Nmax --)      \ starts compilation
      CREATE              \ make a new dictionary entry
      1-  ,               \ store Nmax-1 in its body
 ;                        \ for bounds clipping

 : get_xt    ( n base_adr -- xt_addr)
      DUP  @      ( -- n base_adr Nmax-1)
      ROT         ( -- base_adr Nmax-1 n)
      MIN  0  MAX    \ bounds-clip for safety
      1+  CELLS+  ( -- xt_addr = base + 1_cell + offset)
 ;

 : |   '  ,   ;     \ get an xt and store it in next cell

 : ;jtab   DOES>  ( n base_adr --)   \ ends compilation
           get_xt  @  EXECUTE        \ get token and execute it
 ;    \ appends table lookup & execute code

 \ Example:
 : Snickers   ." It's a Snickers Bar!"   ;   \ stub for test

 \ more stubs

 5 jtab:  CandyMachine
          | Snickers
          | Payday
          | M&Ms
          | Hershey
          | AlmondJoy
 ;jtab

 3 CandyMachine  It's a Hershey Bar!   ok
 1 CandyMachine  It's a Payday!   ok
 7 CandyMachine  It's an Almond Joy!   ok
 0 CandyMachine  It's a Snickers Bar!   ok
-1 CandyMachine  It's a Snickers Bar!   ok

forth.org

Copied from http://forth.org/svfig/Len/definwds.htm

It has been said that one does not write a program in Forth. Rather, one extends Forth to make a new language specifically designed for the application at hand.

An important part of this process is the DEFINING WORD, by which it is possible to combine a data structure with an action to create multiple instances that differ only in detail.

The basics of create … does>

Defining words are based on the Forth construct create … does>, which beginners can apply mechanically. The steps are:

  • Start a colon definition

  • Write create

  • Follow by words that lay down data or allot RAM, thus creating the body

  • Write does>

  • Follow by words that act on the body.

These steps are fairly simple, but understanding them is complex because there are three stages in the action of a defining word.

An example

Our example will be indexed-array, which allots an area of RAM. At run time, it takes an index, i, and returns the address of the ith cell. If i=0 the address of the first cell is returned because Forth conventionally starts numbering at 0.

Stage 1

: indexed-array ( n -- ) ( i -- a )
   create cells allot
   does> swap cells +
;

Stage 2

20 indexed-array foo  \ Make a 1-dimensional array with 20 cells

Stage 3

3 foo                \ Put addr of fourth element on the stack

Stage 1: Compiling the defining word

The first phase is in effect during the compilation of indexed-array, that is, between the colon and semicolon. The colon sets up a header. Then, execution tokens of ordinary Forth words are laid down, while Immediate Words are executed at once. The process is terminated by the semicolon.

The only Immediate word in indexed-array is “does>”. It lays down code that will act later in stage 2.

Stage 2: Creating a “child”

The second phase is in effect when “indexed-array” is used to create “foo”.

  • create sets up a header

  • cells allot reserves n cells, forming the “data field” (formerly called “body”) of foo.

  • The code that was laid down by “does>” now comes into action. It changes the execution of “foo” so that it will:
    • Put the address of its data field on the stack, and then

    • Execute the Forth words between “does>” and the semicolon.

Stage 3: Executing the child

In the third phase, we execute “foo”.

  • i is already on the stack, and the origin of the data field is put on top of that

  • “swap” rearranges the stack

  • “cells” multiplies i by the cell length

  • “+” adds the result to the origin of the data field.

Warning

Important issues such as range checking and multi-dimensional arrays are not discussed here.

Note

Why is there a right angle-bracket in “does>”? It originated in early Forths in which create was followed by “<builds … does>”. Later, the action of “<builds” was incorporated in “create”, but the spelling of “does>” was not changed.

Simple Example with LOOP

: gen-shc CREATE 5 0 DO  1 . LOOP DOES> @ ; ok.

gen-shc myc 1 1 1 1 1 ok.