Coloring Forth

Because we must deal with the unknown, whose nature is by definition speculative and outside the flowing chain of language, whatever we make out of it will be no more than probability and no less than error.

         EDWARD SAID

Beginning, Intention and Method

 

Motivation

Core Words

Specials

bye

Quit ColorForth.

 

Stack

No effect

 

ColorForth Source

 

bye

 

 

Assembler

Bye:    
    push   0      
    call   ExitProcess

 

 

 Macros

macro2 dd offset semi
       dd offset cdup
       dd offset qdup
       dd offset cdrop
       dd offset then
       dd offset begin

 

 

; semi

Overview

Semicolon – terminates current definition.

Implementation

Implementation is non-trivial – it provides both tail-recursion support and some optimization. If the last compiled item was a call to a word, it’s being replaced with a jmp. Otherwise, ret is compiled.

 

List variable contains address of last compiled word.

 

H stands for HERE

 

H      dd  0                                                                         
list   dd  0, 0                                                                             
 
semi: 
    mov    edx, [H]
    sub    edx, 5
    cmp    [list], edx
    jnz    @f
    cmp    byte ptr [edx], 0e8h
    jnz    @f
    inc    byte ptr [edx] ; jmp
    ret
@@: mov    byte ptr [5+edx], 0c3h ; ret
    inc    [H]
    ret

 

Sample

Forth
x 1 2 + ;
y x x ;

Code

                          x:
008A07B5 8D 76 FC             lea         esi,[esi-4]
008A07B8 89 06                mov         dword ptr [esi],eax
008A07BA B8 01 00 00 00       mov         eax,1
008A07BF 05 02 00 00 00       add         eax,2
008A07C4 C3                   ret
                          y:
008A07C5 E8 EB FF FF FF       call        x
008A07CA E9 E6 FF FF FF       jmp         x

 

 

This sample deserves some explanation.

 

EAX contains the topmost element of data stack. ESI points to the second element. In order to put the value 1 onto stack, we have to store current topmost element in memory and decrement stack pointer, and only then load the constant into EAX. This operation takes 2 Pentium commands, 5 bytes, XXX ticks . Not so bad.

 

As we can see, 2 + is compiled into something shorter and better, we’ll look at literal optimization later.

 

Now, for the word X the semicolon ; has compiled ret, while for Y the second call to X has been replaced with a jump.

dup cdup

Implementation

As we can guess from the code below, cdup stands for “compile dup”. It compiles 5 bytes onto the top of the dictionary, and we have seen these 5 bytes above. So, dup is implemented as a macro, which compiles code to push the topmost element from EAX onto in-memory stack.

cdup: 
    mov    edx, [H]
    mov    dword ptr [edx], 89fc768dh
    mov    byte ptr [4+edx], 06
    add    [H], 5
    ret

 

 

Traditional Forth implementation could look like this.

hex

: dup 89fc768d , 06 c, ; immediate

?dup qdup

Implementation

qdup:
    mov    edx, [H]
    dec    edx
    cmp    [list], edx
    jnz    cdup
    cmp    byte ptr [edx], 0adh
    jnz    cdup
    mov    [H], edx
    ret
   

This code looks whether the last compiled instruction was 0adh, which stands for

    lods   dword ptr [esi]

 

And this is nothing else than a drop – move second element into the top of stack register, and increment stack pointer. So, ?dup works as dup, though if last compiled instruction was drop, it shifts HERE one byte back – actually, “uncompiles” the drop.

 

This trick is used for optimizing macros – immediate words – that compile code with stack notation ( -- n), for example, 0, A, and pop, following words, which take one item off stack ( n -- ) – for example, push (traditional >R), ! or A!.

0 ?dup c031 2, ;
a ?dup c28b 2, ;

pop ?dup 58 1, ;

! ?lit if ?lit if 5c7 2, swap a, , ; then 589 2, a, drop ; then a! 950489 3, 0 , drop ;

push 50 1, drop ;

a! ?lit if ba 1, , ; then d08b 2, drop ;

 

The macro 0 puts zero onto stack. It compiles into

     xor    eax, eax

 

A puts the value of the address register onto stack. Corresponding Pentium code is

     mov    eax, edx

 

Let’s consider an example

x1 here 2/ 2/ dup push a! 0 a ! pop @ ;

 

This code moves values 1 and 0 into the cell on the top of the dictionary.

here

2/

2/

push

 

 

drop 

a!

0

dup

 

a

!

 

 

pop

@

;

 

008A07CF E8 8B 12 B6 FF       call        here (00401a5f)
008A07D4 D1 F8                sar         eax,1
008A07D6 D1 F8                sar         eax,1
008A07D8 8D 76 FC             lea         esi,[esi-4]
008A07DB 89 06                mov         dword ptr [esi],eax
008A07DD 50                   push        eax
008A07DE AD                   lods        dword ptr [esi]
008A07DF 8B D0                mov         edx,eax
008A07E1 B8 00 00 00 00       mov         eax,0
008A07E6 8D 76 FC             lea         esi,[esi-4]
008A07E9 89 06                mov         dword ptr [esi],eax
008A07EB 8B C2                mov         eax,edx
008A07ED 8B D0                mov         edx,eax
008A07EF AD                   lods        dword ptr [esi]
008A07F0 89 04 95 00 00 00 00 mov         dword ptr [edx*4],eax
008A07F7 58                   pop         eax
008A07F8 8B 04 85 00 00 00 00 mov         eax,dword ptr [eax*4]
008A07FF C3                   ret

 

 

Implicit dups and drops are highlighted with light blue. In future I will be replacing Pentium instructions with dup and drop macros. It’s interesting to notice that dup is 5 bytes, white drop – only one.

 

This example, though a bit artificial, illustrates how address register and store-fetch pair works, as well as introduces “address as offset from 0 in cells” and “address as offset from 0 in bytes”. I’ll be calling these cell address and byte address.

 

.

 

Hosted by www.Geocities.ws

1