Because we must deal with the unknown, whose nature is by definition speculative and outside the flowing chain of language, whatever we make out of it will be no more than probability and no less than error.
– EDWARD SAID
Beginning, Intention and Method
bye
Quit ColorForth.
|
Stack |
No effect |
ColorForth Source
|
bye |
Assembler
Bye: push 0 call ExitProcess |
macro2 dd offset semi dd offset cdup dd offset qdup dd offset cdrop dd offset then dd offset begin |
; semi
Overview
Semicolon – terminates current definition.
Implementation
Implementation is non-trivial – it provides both tail-recursion support and some optimization. If the last compiled item was a call to a word, it’s being replaced with a jmp. Otherwise, ret is compiled.
List variable contains address of last compiled word.
H stands for HERE
H dd 0 list dd 0, 0 semi: mov edx, [H] sub edx, 5 cmp [list], edx jnz @f cmp byte ptr [edx], 0e8h jnz @f inc byte ptr [edx] ; jmp ret @@: mov byte ptr [5+edx], 0c3h ; ret inc [H] ret |
Sample
|
|
Code
x: 008A07B5 8D 76 FC lea esi,[esi-4] 008A07B8 89 06 mov dword ptr [esi],eax 008A07BA B8 01 00 00 00 mov eax,1 008A07BF 05 02 00 00 00 add eax,2 008A07C4 C3 ret y: 008A07C5 E8 EB FF FF FF call x 008A07CA E9 E6 FF FF FF jmp x |
This sample deserves some explanation.
EAX contains the topmost element of data stack. ESI points to the second element. In order to put the value 1 onto stack, we have to store current topmost element in memory and decrement stack pointer, and only then load the constant into EAX. This operation takes 2 Pentium commands, 5 bytes, XXX ticks . Not so bad.
As we can see, 2 + is compiled into something shorter and better, we’ll look at literal optimization later.
Now, for the word X the semicolon ; has compiled ret, while for Y the second call to X has been replaced with a jump.
dup cdup
Implementation
As we can guess from the code below, cdup stands for “compile dup”. It compiles 5 bytes onto the top of the dictionary, and we have seen these 5 bytes above. So, dup is implemented as a macro, which compiles code to push the topmost element from EAX onto in-memory stack.
cdup: mov edx, [H] mov dword ptr [edx], 89fc768dh mov byte ptr [4+edx], 06 add [H], 5 ret |
Traditional
|
hex : dup 89fc768d
, 06 c, ; immediate |
?dup qdup
Implementation
qdup: mov edx, [H] dec edx cmp [list], edx jnz cdup cmp byte ptr [edx], 0adh jnz cdup mov [H], edx ret |
This code looks whether the last compiled instruction was 0adh, which stands for
|
lods dword ptr [esi] |
And this is nothing else than a drop – move second element into the top of stack register, and increment stack pointer. So, ?dup works as dup, though if last compiled instruction was drop, it shifts HERE one byte back – actually, “uncompiles” the drop.
This trick is used for optimizing macros – immediate words – that compile code with stack notation ( -- n), for example, 0, A, and pop, following words, which take one item off stack ( n -- ) – for example, push (traditional >R), ! or A!.
|
|
The macro 0 puts zero onto stack. It compiles into
|
xor eax, eax |
A puts the value of the address register onto stack. Corresponding Pentium code is
|
mov eax, edx |
Let’s consider an example
|
|
This code moves values 1 and 0 into the cell on the top of the dictionary.
|
|
Implicit dups and drops are highlighted with light blue. In future I will be replacing Pentium instructions with dup and drop macros. It’s interesting to notice that dup is 5 bytes, white drop – only one.
This example, though a bit artificial, illustrates how address register and store-fetch pair works, as well as introduces “address as offset from 0 in cells” and “address as offset from 0 in bytes”. I’ll be calling these cell address and byte address.
.