tokparse.txt
-Documentation for tokenization routinen
(C)1999 Keinall "Granitor" Caddle

--]
History

99/04/21 : iteration 0 is formed, replacing the void.
99/04/22 : iteration 1 adds minor tweaks.

--]
Scope

This doccument describes the m6 tokenization API (tokparse.h),
the implementation (tokparse.c), and the Specially Adapted C-NASIC
commands used to describe the primary and alias tokens, and also
conversion from Mainstream C-NASIC to Tokenized NASIC (toklist.cas;
the funtion tknParser(), intenal to tokparse.c).

---]
The wonderful API

The API is still being developed, as is the implementation.
However, no radical changes are forseen, only additions, and
minor changes (like passing by reference vs. value).

void tknParserInit(MEMSCOPE nasic);

This is fairly self descriptive,
given that you have a MEMSCOPE which contains a processed .CAS file,
pass it in to tokparse via tknParserInit( victim);

This routine will modify the MEMSCOPE by building four tables:

---:
Before tknParserInit but after processing:

MEMSCOPE nasic

Compressed code         - F.Block 0
---
FREE SPACE
---
Label table             - B.Block 0


---:
After tknParserInit:

MEMSCOPE nasic

Compressed code         - F.Block 0
Detokenization table    - F.Block 1
m1 conversion table     - F.Block 2
---
FREE SPACE
---
Alias tokens    -} TTT  - B.Block 2
Primary tokens  -} TTT  - B.Block 1
Label table             - B.Block 0

---:
The detokenization table has the following format:

BYTE value;
BYTE len;
BYTE str[];

value - the value of the token. on the C64, tokens were always values from
128 and up, one byte always. The C128 used two-byte tokens for some things,
and some BASIC extentions to the C64's stock basic did this as well,
but for now, we'll stick with one-byte tokens, as they'll be needed for
64compatibility mode in any case. I'll cross the multibyte token bridge
later, right now it's not important how C99 works but now the C64 works.

len - the length of the token's text string. on a C64 I believe this was
always less than 6, however, the limit for M is around 16, I think. In
any case, a limit of 255 is quite generous...

str[] - the string of the token name, stored in IBM ASCII, in the
case it was found in toklist.cas. It would be more efficient to store
it in lower case, I suppose, but I don't, so, deal with it, punk!

so far, the detokenization table is unused.

---:
The m1 conversion table:

This is a simple one-to-one mapping between CAS "tokens" and
NASIC tokens as described in the src file by the 'c' command, more
on that later.

Format:

char    CAStoken;
BYTE    NASICtoken;

so far, this is also unused.

---:
The TTT

The primary and alias tables together form
*drumroll please*

                The Tokenization Table

This wonderful creation was the product of many man-weeks of research
and development, and is guarenteed(*) to bring you many hours of enjoyment.

Format of an entry:

BYTE    len;
BYTE    str[];
BYTE    val;

len     - the length of the string
str     - the actual string, stored as given in the CAS src file
val     - the value of the token.

(*)     : No warranty is given, implicit or explicit. The above statement
was a clever marketing ploy which is not legally binding.

---:
Primay and alias tokens

Primary tokens are two-way, that is, when you LIST a program, you get the
same token name which you entered. Alias tokens map to primary tokens,
which is really handy for conversion between my Chip's NASIC .CBM files,
C64's .PRG files, and my up-and-coming NASIC files.

Now, let's delve into the SACAS format, which is similar to MCAS format
(see the mness docs,nessuse.txt, for more into on MCAS), but different
because all of the commands are different, and labels aren't used.

---]
SACAS : Specially Adapted CAS
(pronounces sack-ass)

The SACAS pre-parser is identical to that used by mness,
so all the rules apply, and you can use all of those forgotten
comments (e.g. // ). Do not used remembered comments, nor labels.

However, they are only 3 recognized commands:

p[value],[string]
- primary token delaration. The final character of the srting (which
should be enclosed in quotes) should be one of ' ', '@', or '#';
which are the normal space, the at sign (as in "PRINT@"), and the
infamous funny looking # thingy, as in "PRINT#".

a[value],[string]
- alias token declaration. No trailing character needed.

c[CAStoken],[value]
- conversion entry for m1 filen, CAStoken shoul dbe enclosed in quotes.

and, of course, '-' on a line by itself terminates the file neatly.

---]
Implementation

Currently, the implementation is too trusting (i.e. it doesn't catch
many errors in the input), and is a bit inefficient as it used 4 seperate
passes of the compressed CAS (a.k.a. CAS bytecode, or Cbc) to set up
all 4 tables, it could do it in 2 passes, or even break the TTT standard and
do it in one pass. But, I don't feel like doing that right now, if I
clean code anything that works, it'll be rendlib most probally.

---]
Druggle

splog, splok, sploo splink splang spin,
poink, narf, bing pong ping pang-

beyond the shadows,
lie the void,
beyond the void,
lies the substance,

within all,
the is the Tao,

to Qoat I go,
to Qoat, to Qoat.

---]
Hails:

Master Chip,
        [place comment here]
        There, a void * comment.. or something.

        Well, now, off to eat breakfast am I,
        hope you enjoy tokparse...

---]
Wait, there's more!

use ee as CLAen in invocation of ident.exe, and look in idedrv.log;
this lists the token values (or 0 if unmatched) of the commands
you type in!
right now the whole line is treated as a command,
i'll address this l8r.

Also, tokparse.log is produced, and includes a memory dump of the
NASIC memscope! It's not perfect, I need to recode memDumpf in
foolib, but it wacks okay, I guess.

---]
Killing Flying Pigs with Hand Grenades:

This is a personal fantasy of mine, ever since I played Pegasus,
and again when I played Battle Toads. Battle Toads and Double
Dragon is supposed to be really kool, I've seen if a bit and
of couse, my cosmic omniscienceishness gives me an idea of how it
is, but I don't own the game or an image of it and have never played it.

Nonetheless, someday I'll do something even kooler than it...

Gunstar Heros by Treasure on Sega Genesis is very kool,
I think it was/is on Neo Geo or related hardware as well,
I'll have to look into that, but, meanwhile, keep in mind that
my eventual goal is to make stuff like that, not for C64 but
by C99.

---]
Warriors and Explorers

Ever heard of this riddle? I think it's a mensa thingy...
in any case, I'll relate it another time.

---]
Url:    http://www.geocities.com/SiliconValley/Lakes/2658
email:  granitor@geocities.com

[[MooFoxy99
