CharacterSet

 

String[Name]["whatever"] CharacterSetName

 

The rules for Reckoning a String as an Expression are set by�

String[Name[�charSetNam�]][RuleList]

�where�

RuleList = {Noops, Digits, Letters, ParaPuncs, Operators}.

This is called the CharacterSetAssignment

 

Noops = {EscapeSequence, StartComment, EndComment,

UnrecognizedCharacter, EmptySpace, �}

 

Digits = {CC0, CC1, �}

 

Letters = {CCa, CCb,�}

 

ParaPunc={StartString, Separator, EndString, FunctionName}

ParaPuncs = {{�[�, �,�, �]�}, {�{�, �,�, �}�, List}, {���, �, �, ���, PatternFormSequence}, ParaPunc�}

 

Operators = {ContextMark, StartGroup, EndGroup, copula�}

 

 

The Grok32` String is designed to facilitate CharacterSets of arbitrary complexity.� The object is to create a programming abstraction capable of modeling (the ling) any language�s CharacterSets without prejudice.� In any case, the similarities are sufficient so that a standard RuleList can be constructed for a CharacterSet

 

It is believed that the String embodies fundamental aspects of

 

All linguistic communication written or spoken is received and sent as a temporal sequence of information �chunks�.

 

Furthermore, it is believed that all linguistic Strings employ similar mechanisms to parse and elicit meaning.� For example, the String could probably be adapted to help parse phonetic linguistics or any character-expression sequence, human or otherwise.� Frequently, communication is multi-channel, and a correct �parsing� requires parallel String processing of completely different kinds of �character strings�.� A �character string� could be a sequence of human gestures.� What complex of CharacterString channels are sufficient to characterize a whale specie�s language?

 

 

CharacterSetName

 

(1)                    String[Name]["whatever"]

 

�returns the CharacterSetName(s) of the String, "whatever"

 

If "whatever" is a concatenation of several different Strings, then (1) will return a Sequence of CharacterSetNames.  (See String Implementation Notes.)

 

CharacterSet definitions are stored in Contexts bearing the CharacterSetName.  These Contexts are subContexts of Construct`String`Name`.  The CharacterSetName is identical to the subContext name without the ContextMarks.

 

For example, the Character groupings (see below), for the ASCII CharacterSet are kept in Construct`String`Name`ASCII`

.  If "whatever" in (1) is an ASCII String, the name returned by (1) will be �ASCII.�

 

Grok32` does not contain any Character glyph definitions or rendering software.  The host machine�s text rendering software is better suited to this task.  The String object presumes that the CharacterSet and CharacterCode are the only relevant facts about a Character.

 

 

 

CharacterSetAssignment

 

If �charSetNam� is the string-name given to a CharacterSet,

�then the character semantics whereby a String is interpreted as an Expression can be assigned with the following declaration:

 

(2)                    String[Name[�charSetNam�]][RuleList]

            where

                        RuleList = {Noops, Digits, Letters, ParaPuncs, Operators}

           

Each CharacterSet has its own RuleList.  The five lists in RuleList assign meanings to individual Characters and thereby specify how to parse a String as an Expression.  The five sublists reflects a categorical subdivision of the Characters.  See RuleList

 

By combining a customized CharacterSet with appropriately designed Named Functions and procedures, it is possible to mimic (model) most languages with Grok32`.

 

Example 1

In ASCII and other CharacterSets, the equal sign is a token for Name[_, _].

This operator is assigned in the Standard ASCII RuleList.

Thus, �a = 3� is interpreted as Name[a, 3].

 

Example 2

If �els... are all Strings, then�

                        Sequence[{�els...}]

sorts the elements in {�els...} using the letter and digit order established by the Digits and Letters, lists.

 

 

 

Glossary

 

ParaPuncs are sometimes called List punctuators and are three Characters named {StartString, Separator, EndString}.

 

 

CharacterCode (often abbreviated to �CC�) is an integer used to specify a specific Character for Expression parsing purposes.  Grok32` conforms to the Character-Glyph Model which separates a Character�s glyph (display value) from its semantic value.

 

 


Grok32`

� 2004, 2005

by John Van Wie Bergamini. All rights reserved.

Hosted by www.Geocities.ws

1