TextFile

 

A String is coded as one or more bytes.

The first byte codes two values:

 

            FirstStringByte = {asciiQbit, CharSetID}

 

If asciiQbit is True, the remaining 7 bits assigned to CharSetID is 128-bit ASCII  code.

If asciiQbit is False, the remaining 7 bits assigned to CharSetID identifies a CharacterSet and it determines how subsequent bytes in the String are interpreted.

 

The identity of a CharacterSet may be specified with 7 bits because the String object keeps track of the CharacterSets which have been read into memory.  The name of a CharacterSet is assigned an identifying 7-bit Cardinal in the first byte of a String�s structured ByteString sequence. 

 

When a complex (using many CharacterSets) String is Compiled, the functional equivalent of a With[�] construct is written.  This is the subject of TextFile

 

The first byte of a String specifies its CharacterSet. One bit of each byte specifies whether the byte represents a 7-bit ASCII character or not.  This bit is called the ASCIIbit.  If it is True, then the byte is interpreted as an ASCII character.  An ASCII String lasts as long as each element in a sequence of bytes continues to have True ASCIIbits.  The moment the ASCIIbit is False, the ASCII String ends, and the remaining 7-bits either represents the EndCharacter or the 7-bits specify the identity of another CharacterSet.  In this case, an ASCII String will be followed immediately by a non-ASCII subString.

 

The Grok32` Kernal keeps track of each CharacterSet it encounters.  Each CharacterSet employs its own RuleList.  A TextFile presumption is that a relatively small number (less then 128) of CharacterSets will be used in a String.

 

The identity of a CharacterSet may be specified with 7 bits because the code supporting String keeps track of the CharacterSets which have been read into memory.  The name of a CharacterSet is assigned a Cardinal value stored in the 7-bits used to specify a CharacterSet.

 

A non-ASCII String continues until its EndCharacter is reached.  Over that span of characters, the CharacterStream is interpreted according to the CharacterCode of that non-ASCII String.  The ASCIIbit in an ASCII String is a concise means of specifying concatenated Strings from different CharacterSets.  In Grok32`, 128-bit ASCII is the default CharacterCode.

(*********************************************************************)

 

The TextFile has the following structure:

 

struct   TextFile

            char     CharacterSetNames[]

            string  TextString

 

Where CharacterSetNames[] is an array of ASCII Strings identifying a CharacterSet.

TextString is then

 

What this means is that the TextString

 

CharacterSets is an array of CharacterSet specs.

A CharacterSet spec, charSpec, is a type (Type[charSpec] is True.)

Each Handle refers to a


Grok32`

� 2004, 2005

by John Van Wie Bergamini.

All rights reserved.

Hosted by www.Geocities.ws

1